I’ve been working on getting a suite of models implemented and trained.
The data, as introduced in my previous posts, has been engineered to be compatible with most off-the-shelf machine learning models. Here I want to offer a variety of initial models and engineer a training pipeline that allows any of these models to perform well out of the box. This ensures flexibility and allows us to provide a strong starting point for users to build upon.
Model Suite
Generally, the network will benefit from a diverse pool of forecaster models. We offer several models:
- Tree-Based Models:
- LightGBM and XGBoost: Efficient, scalable, and excellent for structured data with categorical features.
- RandomForest and ExtraTreesRegressor: Robust ensemble methods for capturing complex interactions.
- GradientBoostingRegressor: A slower but highly accurate boosting algorithm.
- Neural Networks:
- Flexible architectures for capturing non-linear relationships
This diverse pool accommodates different user preferences and also serves as a foundation for us to test how diverse forecasts influence the network.
Training Workflow
We focus on making the models as accurate as possible out of the box through a systematic training process:
- Grid Search Cross-Validation:
- Hyperparameter tuning is performed using grid search with cross-validation to find the optimal settings for each model. Each model is configured with an initial set of parameters to search over.
- Cross-validation ensures robust performance across different data splits, minimizing overfitting and improving generalizability.
- Full Data Training:
- After determining the best parameters, the model is trained on the full dataset to maximize its exposure to historical patterns and its ability to capture the most recent data.
- The trained model and its parameters are saved for use during forecast generation.
This process ensures that the forecaster is both robust and ready to handle new data with minimal adjustments.
Validation
I’ve implemented the process outlined above using data from the test network. Here I will focus on topics 3 and 5 which are BTC and SOL 10 minute predictions, respectively. To perform these tests I first find the best parameters using grid search cross validation. Then I iterate over the most recent active inferences, remove that inference from the dataset to use as a test point, and train on the remaining data. Then I test the models on the removed data point and compute error metrics. This let’s us directly test how well the models are forecasting the most recent losses.
Topic 3: BTC 10min prediction
• Extra Trees
- Mean Squared Error: 0.2778
- Mean Absolute Error: 0.3829
- Mean Absolute Percentage Error: 6.79%
• Gradient Boosting
- Mean Squared Error: 0.3687
- Mean Absolute Error: 0.5391
- Mean Absolute Percentage Error: 8.60%
• Random Forest
- Mean Squared Error: 0.4050
- Mean Absolute Error: 0.4834
- Mean Absolute Percentage Error: 8.19%
• LightGBM
- Mean Squared Error: 0.4341
- Mean Absolute Error: 0.5260
- Mean Absolute Percentage Error: 8.73%
• Neural Network
- Mean Squared Error: 0.5356
- Mean Absolute Error: 0.5850
- Mean Absolute Percentage Error: 9.57%
• XGBoost
- Mean Squared Error: 1.0785
- Mean Absolute Error: 0.8792
- Mean Absolute Percentage Error: 13.49%
Topic 5: SOL 10 min prediction
•
LightGBM
- Mean Squared Error: 0.1398
- Mean Absolute Error: 0.2853
- Mean Absolute Percentage Error: 3.46%
• Random Forest
- Mean Squared Error: 0.1600
- Mean Absolute Error: 0.2863
- Mean Absolute Percentage Error: 4.02%
• Gradient Boosting
- Mean Squared Error: 0.2376
- Mean Absolute Error: 0.3786
- Mean Absolute Percentage Error: 5.94%
• Extra Trees
- Mean Squared Error: 0.2378
- Mean Absolute Error: 0.4117
- Mean Absolute Percentage Error: 6.63%
• XGBoost
- Mean Squared Error: 0.3154
- Mean Absolute Error: 0.4058
- Mean Absolute Percentage Error: 4.95%
• Neural Network
- Mean Squared Error: 0.4380
- Mean Absolute Error: 0.5333
- Mean Absolute Percentage Error: 7.75%
The scatter plots show the true reported losses from the test network on the x axis and the forecasted losses on the y axis, colored by model type. The values reported in the bulleted lists are computed by averaging over each test and are sorted based on MSE. The main thing to note is that every model is demonstrating the capability to provide accurate forecasts out of sample. The two topics correspond to two different learning tasks, and the results above show that the models have varying performance on the different topics. This highlights the need for a diverse pool of forecasters.
These results demonstrate that the forecaster models perform well across different topics and can generalize effectively outside the training data, and provide accurate forecasts with minimal tuning.
In this pipeline I strategically avoided doing train/test splits based on timestamps as is typically done for time series. Our setup presents unique challenges – workers have different amounts of historical data, making typical time series splits impractical. Custom solutions are required and above I opted to use all data and off the shelf cross validation, but other custom approaches could be applicable.
By combining a flexible model suite with robust training methods we ensure the forecaster is accurate and generalizable. The infrastructure we’ve built enables quick experimentation with different models, making the system adaptable to evolving network scenarios.