We first identified a number of potential improvements to the forecaster model.
Model structure: The first version of the forecaster made use of a global model, where inferer addresses were used as a categorical feature in training. However, if the forecaster cannot adequately distinguish information from different inferers with the address feature, then a global model may simply predict the mean for all inferers. As an alternative we consider a per-inferer method, with separate forecasting models for each inferer.
Target variable: As discussed in the Allora Whitepaper, the first model forecasts the losses for each inferer, which are then converted to regrets to be passed to the weighting function. A potential drawback of this method is that the loss-to-regret conversion must use the network loss at the previous epoch (i.e. R_forecast = log L_network,prev - log L_forecast
) rather than the actual network regret (which is not yet available). If the network regret changes significantly from epoch-to-epoch this affect the final weighting. However, a benefit of forecasting losses is that they are independent for each inferer, and therefore do not depend on the makeup of the active set of inferers (see details about merit-based sortition here).
As alternatives, we consider models that instead forecast the regret or z-score of the regret for each inferer. This way, the forecaster only needs to predict the relative accuracy for each inferer, rather than the absolute accuracy (as for losses). However, these methods could then be sensitive to changes in the active set of inferers if the network loss changes significantly.
Feature engineering: A number of engineered properties (exponential moving averages and standard deviations, rolling means, gradients) require an epoch span to be defined for the calculation. As the optimal span length or combination of spans is not obvious (e.g. shorter or longer spans), we will test the combinations that produce the best outcomes. We will also test if the current features are sufficient to detect periodic outperformance, or whether further feature engineering is required.
Next, we must decide on a series of tests to identify the best performing model and feature set.