Thanks!
“Not sure I agree with unhealthy as a factor of >10 difference between max and min”:
I guess my thought was that if the max value was much larger than min value, this would indicate unhealthiness. But now I realize this is really just detecting how small the smallest entry is. For example, if log10(max/min)>X
we have max>min*10^X
, but since we are looking at fractions max<1
which implies min<1/10^X
. But with what I did, X=1, we are just looking for min<0.1. Which doesn’t have anything to do with health really. So I agree. This would make more sense with a factor of 100 between min/max ie X=2 but we are not seeing a log10(max/min)>2
in the simulations.
“You clearly demonstrate something happens there, but to what extent are you not making an assumption about the decision boundary that implies this as a critical point?”
“So just looking at the data: where does it come from?”
In my script, I am basically fixing an X at the beginning and then developing a decision rule to detect if either std(log10(fracs))
or log10(max/min)
is >X
. I am deciding X
, I picked the values .5
for stdev and 1
for the ratio just arbitrarily based on the plots and their typical ranges. So these values for X
aren’t really coming from the data. I’m not sure what the best way to approach that would be.
I think this goes back to what we discussed here, it may be useful to monitor these values over time once deployed and decide thresholds based on the experience we get in practice.