Monitoring inferer health

On “what is small”: I think we can quantify that statistically in terms of expected stdev/coveriance etc. Even a simple MC experiment would probably tell us what is right.

But maybe we actually want to observe typical values we get in practice, and learn from/decide based on that.