Research Monitoring Suite: How We Watch the Allora Network

In this post I want to showcase the Research Monitoring Suite (RMS) — our internal infrastructure for understanding what’s actually happening on the Allora Network. From validating theoretical models to catching subtle bugs, these highlights demonstrate how comprehensive monitoring enables us to build a robust decentralized intelligence system.


What is the RMS?

The Research Monitoring Suite (RMS) is our internal system for understanding what’s actually happening on the Allora Network. While it’s not accessible to external users at the moment, we want to share some insight into our approach, because the quality of a decentralized system depends heavily on how well its developers can observe and understand its behavior.

How it works: We have built a custom indexer which monitors events and transactions from the Allora blockchain and populates a PostgreSQL database with structured, queryable data. Every inference submission, every regret calculation, every reward distribution, it’s all captured and indexed. We augment this with another database containing ground truth data - currently mainly OHLC data from crypto markets. Then we built a collection of dashboards to visualize and analyse this data.

Scale: On testnet alone, we have processed over 30 million transactions and 500 million chain events. This volume of data is essential for statistical validity — many of the findings in this post required analyzing patterns across tousands of data points.

Designed for iteration: The data in our database is intentionally only lightly processed from the raw events and transactions. This, together with using Juptyer notebooks to organize the dashboards, keeps the research workflow flexible — when we notice something strange in the data, we can immediately write queries to investigate, visualize patterns, and share findings with the team. Furthermore, we built the indexer to be refillable from scratch from an archive node within a few hours. This is important when we roll out updates to our constantly evolving blockchain, or find additional events we want to instrument. Overall, fast iteration was one of the key design principles of our monitoring system.

Below, we show some highlights which emerged from this infrastructure. Having comprehensive, queryable access to network history allowed us to validate theoretical models against real behavior, discover edge cases we hadn’t anticipated, and build confidence that our protocol designs work as intended.


1. Worker Performance Analysis

A core challenge in evaluating inference workers: how do you distinguish genuine predictive skill from luck? With enough workers making predictions, some will appear to perform well purely by chance. The RMS includes statistical frameworks specifically designed to answer this question.

The Worker Performance Dashboard

Our worker performance dashboard evaluates each inferer against multiple metrics, each with statistical tests to determine significance:

  • Directional accuracy: What fraction of predictions got the sign right? We want >55% accuracy, but more importantly, we compute binomial p-values to test whether this accuracy is statistically distinguishable from random guessing. A worker showing 57% accuracy over 50 predictions might be lucky; 57% over 500 predictions is meaningful.
  • Correlation with ground truth: We compute Pearson correlation between predictions and actual values, along with t-test p-values. A positive correlation only counts if the p-value is below 0.05.
  • Improvement over baselines: How much better is this worker than a naive “predict zero returns” model? We track percentage improvement on multiple loss metrics (like RMSE or our custom built “CZAR” loss) to ensure workers are adding value beyond trivial strategies.
  • Prediction calibration: The “aspect ratio” metric checks whether a worker’s predictions have similar variance to actual values. A worker who always predicts tiny movements when actual returns are volatile—or vice versa—fails this check, even if their directional accuracy looks good.

Distinguishing Signal from Noise

The key insight: any single metric can be gamed or achieved by luck. A worker might have good directional accuracy but terrible calibration. Another might show strong correlation over a short window that doesn’t persist.

The dashboard synthesizes all metrics into a summary score, and tracks each metric over time. Workers must pass multiple independent tests consistently to be considered genuinely skilled. This framework was essential for evaluating third-party model providers—we could objectively demonstrate whether contracted workers were delivering real value.

Dynamic Thresholds

Performance thresholds aren’t static. What counts as “good” directional accuracy depends on the asset and time period—some markets are inherently more predictable than others. We integrated dynamic thresholds that adjust based on what baseline models achieve on the same data, ensuring we’re always measuring improvement over realistic benchmarks rather than arbitrary fixed targets.


2. Sortition Analysis & Visualization

Sortition is the process by which workers compete for active slots on topics. The RMS gives us visibility into how this process plays out across the network.

Tracking Score Dynamics

One key plot in the RMS tracks EMA scores against regret values for all workers on a topic. These are what is used to determine which worker gets to participate. Each line represents a worker. Red means the worker is active, blue inactive, and black means it stopped participating. You can clearly see how the scores of most inferers exponentially decay towards a value close to the boundary. These are the dynamics of merit-based sortition as described in [Sortition paper]: every inactive worker gets updated with a quantile of the active score, making its score increase until it is in the active set, and the score of every active worker gets updated by it’s one-out loss - i.e. it constantly has to prove its value to active set, or else its score will drop below the threshold.

Discovering the InstantScore vs EMA Bug

Actually, the plot shown here is from an older version of the network, and it revealed a subtle flaw! The workers rarely switch from the inactive to the active set. It seems like the attractor of the inactive set is actually below the attractor of the active set, when the opposite should be true! In other words, workers were being held on the waiting list far longer than our models predicted.

The issue turned out to have a simple mathematical reason: the original sortition algorithm was effectively comparing a smoothed quantile value of worker scores with a quantile of smoothed scores. Those are comparable if scores don’t move much, but in a volatile situation the smoothed quantile tends to be much more extreme! We solved it by making a small but impactful change to the sortition mechanism: instead of updating inactive scores with a quantile of the active “instant” scores, we use a quantile of the smoothed “EMA scores”. This ensures that the inactive scores get attracted to a higher value, which guarantees that they eventually make it into the active set.

Animated Visualization

We had a little fun with this and also made an animated variant of the sortition plot. With block height as the time axis, you can watch workers move through the EMA/regret space: new entrants climbing from the waiting list, established workers maintaining position, poor performers getting pushed out. It actually turned out quite useful for building intuition about sortition dynamics over longer time periods.


3. Rewards distribution

A very simple RMS plot that turned out to be one of the most useful is this one, which shows the distribution of rewards between the different actor classes in a topic:

The reason why this is so useful is that it’s downstream from almost every component of the network. All changes in worker composition or behavior, weight of the topic, or network wide tokenomics will show in this plot in some way.

On a network wide level, we continuously monitor the distribution of rewards between topics, governed by the topic weights

which in turn are determined by the reputer stake and topic revenue. This particular topic (testnet topic 69) has seen a couple of recent funding increases (the blue line), which results in a slower, smoother increase in its weight (the red line):

Exponential Reward Drop-off Around Reputer Intermittence

We once noticed an exponential decay pattern correlated with intermittent reputer participation. The pattern wasn’t predicted by any of our models.

Drilling into the data revealed subtle interactions between:

  • The EMA smoothing in reward calculations
  • The timing of reputer submissions
  • How missing submissions are handled

Understanding this pattern — which was only visible because the RMS tracks reward histories continuously — helped us make changes to the reputer composition that made topics more robust. It’s just one example of countless such tweaks we’ve done throughout Allora’s history, and most of them were prompted by something we saw in the RMS.

Conclusion

The RMS has become essential infrastructure for understanding and improving the Allora Network. It allows us to validate theoretical models against real behavior, catch edge cases before they become problems, and investigate anomalies with the depth they require.

In this post we highlighted just a few examples of the insights we get from the RMS every day, which help us continuously improve the Allora network.

The pattern is clear: invest in observability, and the network improves. The RMS continues to evolve alongside Allora, with new queries and dashboards added as we discover new questions to ask.

1 Like