Handling ground truth granularity

Ground truth data is typically delivered at a fixed granularity, depending on fundamental features of the topic or on the data provider. This means that the timing of data points may be aligned to a specific resolution, whether it’s seconds, minutes, or any other time interval. For example, in financial markets, price data for assets like stocks or cryptocurrencies might be provided in the form of “candlesticks” at granular intervals such as 1 minute, 5 minutes, or hourly.

However, in the Allora network, new epochs for a topic can begin at arbitrary points, determined by the blockchain consensus protocol. This means that the start times for these epochs are often not aligned with the specific granularity of the ground truth data.

Does this misalignment introduce inaccuracies in the ground truth, and if so, what can we do to mitigate these issues?

3 Likes

To answer this question, we should first be clear about how these topics currently operate. From the perspective of the chain code, the ground truth for a topic is whatever the reputers decide it to be. But this only works better if the reputers agree on what the topic means, so we need some rules for that. Here’s how it’s currently handled, for topic 13:

The blue/orange/green dots are values for the ETH price reported by the platforms once a minute. They differ a little, but at least their difference remains constant on small time scales. The pink vertical lines denote the “nonce block times”, that is the start of a new epoch. The dark gray region is the “submission window”, during which inferers and forecasters are allowed to submit predictions for that epoch. It lasts for 25% of the epoch (counted in blocks). The light gray region is exactly five minutes long (topic 13 is “ETH 5min Prediction”) and starts at the nonce time. This is the “prediction interval”, meaning that the inferers/network are assumed to know the price at the beginning and to predict the price at the end of it. The reputers’ task is accordingly to report the price at the end of the prediction interval. Since a price for this exact time is generally not available, it instead rounds to the nearest full minute (in the example, the reputer uses the “Tiingo open ETHUSD” price, shown in blue).

If topic 13 was “ETH 5min log-returns” instead, the return would be computed over the prediction interval, i.e. return = log10(target_price) - log10(base_price), where base_price is the price at the start of the epoch (pink vertical line) and target_price is the price 5 minutes later (end of the light gray region). Again, a real reputer would round both values to the nearest full minute.

For those reasons, we want to redefine ground truths for all price prediction topics in the following way: the reputer shall measure the base price (if required for a return topic) at nonce time rounded down to a full minute, and measure the target price the appropriate delay after the base price (5 minutes in the example). This way, the times match up with the data sources and no noise is added through time rounding. Additionally, the base price is always taken before the beginning of the submission window, so it should be available to all inferers:

3 Likes

As a concrete example, I’ll explain how to get the ground truth for topic 58 (“8 hour SOL/USD Log-Return Prediction”, with epoch_length = 52, ground_truth_lag = 5148) in the current Allora infrastructure, following this new definition. This is intentionally completely manually; all we use is allorad and the tool yq to parse its yaml output.

Epochs are enumerated by the block height they start at (also called nonce height). We get the nonce height of the latest epoch via

$ allorad q emissions topic 58 --node 'https://allora-rpc.testnet.allora.network' |\
  yq -r .topic.epoch_last_ended
3839781

So the last epoch started at 3839781, and the previous one started at 3839729, the one before that at 3839677, and so on, always in steps of epoch_length. However, these epochs are not yet completed: the ground truth isn’t available yet, so they also haven’t been scored yet. To get the last completed epoch, we need to subtract ground_truth_lag+epoch_length, rounded up to a multiple of epoch_length (which equals 5200 in this example).

So the last completed epoch started at 3834581, the previous one at 3834529, and so on (again, steps of 52).

allorad can translate these block numbers to times like this:

$ allorad q blocks --query "block.height = 3834581" --order_by asc --limit 1 \
  --node 'https://allora-rpc.testnet.allora.network' | yq -r '.blocks[0].header.time'
2025-05-14T12:08:53.932801787Z

The times are in UTC.

As explained above, to get the beginning of the prediction interval, we need to round this down to a full minute, and to get the end we need to add 8 hours to the result:

Base time: 2025-05-14T12:08:00
Target time: 2025-05-14T20:08:00

We can get the SOL/USD price at this time from services like Tiingo or Binance. We’ll use Binance here because it can be used without an API key. Unfortunately, it’s not available from the US and few other countries. For the base price:

$ date -u -d 2025-05-14T12:08:00 +%s
1747224480
$ curl -s "https://api.binance.com/api/v3/klines?symbol=SOLUSDT&interval=1m&startTime=1747224480000&limit=1" | jq -r '.[0][1]'
181.09000000

And again for the target price:

$ date -u -d 2025-05-14T20:08:00 +%s
1747253280
$ curl -s "https://api.binance.com/api/v3/klines?symbol=SOLUSDT&interval=1m&startTime=1747253280000&limit=1" | jq -r '.[0][1]'
176.81000000

So the base price is $181.09 and the target price is $176.81.

Finally, the topic asks for log-returns. These are computed as log10(target_price) - log10(base_price). In this example, we would get around -0.010387. That is the ground truth for the epoch starting at 3834581.

A related question that often comes up is when an epoch is scored and rewards are distributed. To find that, we just need to add 5200 (ground_truth_lag + epoch_length rounded up as above) to the nonce height, and convert it to a timestamp. For example 3834581 + 5200 = 3839781, and

$ allorad q blocks --query "block.height = 3839781" --order_by asc --limit 1 \
  --node 'https://allora-rpc.testnet.allora.network' | yq -r '.blocks[0].header.time'
2025-05-14T22:28:19.800426618Z

So this block was scored on May 14 at around 22:28:20 UTC.

3 Likes

Thank you very much for this great discussion @florian! This should help define clearly for what timestamps all network participants (inferers, forecasters, reputers) should be sourcing their data and generating their predictions.