To try and solve the narrow z-score range issue, I folded in the â€śone outâ€ť predictions, as these are a simple implementation of the bootstrapping/resampling method described in the trade off study above.

Just to confirm point 1 above, I first tried calculating the uncertainty on `combinator_prediction_oneout`

using the weighted standard deviation, i.e.,

`combinator_prediction_oneout_std`

= np.sqrt(np.dot(`combinator_weights[:,i]`

, (c`ombinator_prediction_oneout[:,i]`

-`combinator_prediction[i]`

)**2))

As expected, the weight distribution for `combinator_weights`

contains (even more) zero weight values, leading to the same problem as above. This confirms it is prudent to avoid using weights for statistical calculations until we have had the chance to test for larger/production-ready networks.

I therefore switched to using the non-weighted standard deviation to determine `combinator_prediction_oneout_std`

and calculated the z-score for each worker normalized by the number of workers.

Here are plots of the z-score distribution (green) of the total network output calculated using `z_scores = (returns - combinator_prediction) / combinator_prediction_oneout_std`

, overlayed with a Gaussian fit to the z_scores (black dashed line, fit results in top left) and the expected distribution (red line, Gaussian, mean=0, std=1). In the top plot, `combinator_weights`

have been used to derive `combinator_prediction_oneout_std`

. In the middle plot, no weights have been applied in calculating `combinator_prediction_oneout_std`

. In the bottom plot, no weights are used, and the z-scores are normalised by `z_scores /= np.sqrt(n_workers)`

.

Because many of the `combinator_weights`

are so small, many of the `combinator_prediction_oneout_std`

are very small, so the spread in z-scores is huge in the top plot. I needed to truncate large values in the array before I could plot. I think this explains the larger std in the top plot.

I note that there is an offset in the mean to -ve numbers in all plots. I think it makes sense to play with the bias to see if we understand that.