To try and solve the narrow z-score range issue, I folded in the “one out” predictions, as these are a simple implementation of the bootstrapping/resampling method described in the trade off study above.
Just to confirm point 1 above, I first tried calculating the uncertainty on combinator_prediction_oneout
using the weighted standard deviation, i.e.,
combinator_prediction_oneout_std
= np.sqrt(np.dot(combinator_weights[:,i]
, (combinator_prediction_oneout[:,i]
-combinator_prediction[i]
)**2))
As expected, the weight distribution for combinator_weights
contains (even more) zero weight values, leading to the same problem as above. This confirms it is prudent to avoid using weights for statistical calculations until we have had the chance to test for larger/production-ready networks.
I therefore switched to using the non-weighted standard deviation to determine combinator_prediction_oneout_std
and calculated the z-score for each worker normalized by the number of workers.
Here are plots of the z-score distribution (green) of the total network output calculated using z_scores = (returns - combinator_prediction) / combinator_prediction_oneout_std
, overlayed with a Gaussian fit to the z_scores (black dashed line, fit results in top left) and the expected distribution (red line, Gaussian, mean=0, std=1). In the top plot, combinator_weights
have been used to derive combinator_prediction_oneout_std
. In the middle plot, no weights have been applied in calculating combinator_prediction_oneout_std
. In the bottom plot, no weights are used, and the z-scores are normalised by z_scores /= np.sqrt(n_workers)
.
Because many of the combinator_weights
are so small, many of the combinator_prediction_oneout_std
are very small, so the spread in z-scores is huge in the top plot. I needed to truncate large values in the array before I could plot. I think this explains the larger std in the top plot.
I note that there is an offset in the mean to -ve numbers in all plots. I think it makes sense to play with the bias to see if we understand that.