Standard forecaster model

We should start by defining the network data or the on-chain features we collect.

Raw Data

The network operates with a large number of workers. To ensure efficiency and relevance, we focus exclusively on the active set— a subset of workers actively engaged in the network at any given epoch. Forecasts are generated only for the active set, allowing the model to operate quickly while maintaining accuracy. The active set is selected via Merit-Based Sortition, details can be found here.

For the active set, we collect historical data encompassing several key features. Each of these features plays a vital role in the forecaster’s ability to generate accurate predictions:

  1. Inferences: the raw inferences themselves, these are the inferences that we are forecasting the losses of. This is arguably the most important feature!
  2. Losses: Actual inference losses. This will provide the target data for training the forecaster.
  3. Rewards: Reflect each worker’s unique contribution and are correlated to their inference losses and scores.
  4. Scores: Quantify the impact of individual predictors on the network’s overall performance, helping interpret losses and rewards.
  5. Addresses: Provides context awareness.
  6. Timestamp and blockheight: Essential for synchronizing with private data.

Each of these features contributes uniquely to the forecaster’s ability to generate context-aware and accurate forecasts.

Feature Engineering

To enhance predictive power, we apply a suite of off-the-shelf feature engineering transformations to the raw network data. For the inference, rewards, and scores we apply the following transformations

  • Momentum Features: Capture directional trends in data.
  • Temporal Features: Reflect the time of day or day of the week.
  • Simple Differencing: Isolate changes between consecutive values to highlight deviations.
  • Rolling Mean and Standard Deviation: Provide smoothed averages and variability measures over fixed windows, helping the model adapt to changing patterns.
  • Many more!

We have found these typical time series transformations improve the overall accuracy of forecasts. The infrastructure is designed to be extensible, allowing us and users to incorporate additional transformations as needed.

By focusing on the active set, leveraging key features, and applying meaningful transformations, the forecaster model extracts insights from network data efficiently and effectively. This infrastructure forms the basis of the model’s predictive capabilities, enabling it to deliver accurate forecasts for the network.

1 Like