Merit-based sortition

I’ve been thinking about the need to define an “active” set of workers or reputers who are participating. During testnet, Allora’s had tens of thousands of workers, and it is computationally infeasible to accommodate all of these in the inference synthesis calculation. So we need some limit on the number of participants for any participant class, and then some way of populating this “active” set from the inactive set, which is constituted by all other participants.

The naive way to do this would be random selection, or “sortition” as it was called in ancient Athens. Across the full set of participants, there exists some degree of redundancy that does not require all participants to be involved in the decision process. This would help us address the concrete computational limitations that make it infeasible to involve all participants.

Allora isn’t best suited for classical, random sortition though. In Allora, the goal of sortition would be to improve efficiency not only while maintaining representativeness, but also while optimizing performance. The latter goal is achieved not through random sortition, but by letting the performance of each participant influence their probability of being drafted. I like to refer to this concept as merit-based sortition.

My goal here is to develop a simple algorithm for merit-based sortition that can be used to increase computational efficiency by limiting active participation without sacrificing (and generally improving) performance.

Cc @steve @Renata for :eyes:

We need an algorithm that:
(1) optimizes the quality of the active set of participants by letting the probability of relegation out of the active set decrease with the participant’s quality;
(2) retains fairness and representativeness by allowing inactive participants an infinite number of chances to be drafted into the active set, in such a way that the probability and frequency of promotion increase with the participant’s quality.

We cannot use instantaneous scores for this, because:
(1) Scores are likely to be volatile, and Q_{i-1} may be a poor predictor for the expectation value at the current epoch E(Q_i).
(2) The score during the previous epoch is unavailable for the inactive participants, even if the participant might have been active at an earlier epoch i'. The distribution of scores at i' may have been different from that at i-1, making Q_i' unsuitable for use in the ranking. This means that the active set is impermeable, because inactive participants cannot be ranked without assigning them an appropriate score.

I think we can overcome these problems with exponential moving averages (EMAs), where the active participants get their EMA updated with their actual score, and the inactive participants get their EMA updated with some percentile of the active ones. This has a number of favourable properties.

Formally, this implies defining the score EMA at epoch i of worker j as:
Screenshot 2024-09-09 at 10.55.17
where for the active set we write the target score using the actual instantaneous score:
Screenshot 2024-09-09 at 10.56.31
and for the inactive set we use some percentile of the active set
Screenshot 2024-09-09 at 10.57.17
where we need to specify the percentile P. If f_P(T_{ij'}) is not instantly available, it is also possible to instead use the percentile from the preceding epoch, i.e. f_P(T_{i-1,j'}).

The percentile P defines what fraction of the active set gets relegated and renewed, and thus also controls the rate of cycling between the active and inactive sets. At any epoch i, the bottom P-th percentile of participants in A_i (ordered by Q_{ij}) are at risk of getting demoted, and will be replaced if any participants in I_i have sufficiently high Q_{i-1,j}.

I will add some experiments demonstrating that this algorithm exhibits the desired behaviour.

Here is a very simple simulation of the above algorithm:


I used an initial set of N_init = 8 participants, a pool of N_act = 5 active participants, and a percentile value of P = 20%. The figure illustrates the construction and performance of the active set as a function of time. “Quality metric” is the nomenclature I use to refer to what I called “score” above. Left: Each color represents a different participant, and the line style reflects the (in)activity as indicated by the legend. Line segments connecting two epochs where a participant changes from inactive to active show as solid, whereas those connecting a transition from active to inactive show as dotted. Middle: Correlation between the median quality metric of each participant (reflecting their ability or performance) and the fraction of the 1000-epoch duration of the simulation for which this participant is active. Colors match those in the left-hand panel. Right: EMA of the mean instantaneous quality metric across the active set for merit-based sortition (solid line) and random sortition (dotted line) over the full duration of the simulation. The panel titles list the number of initial participants, the number of active participants, and the percentile.

The left-hand panel covers the first 50 epochs of the experiment and showcases a variety of relevant properties exhibited by the sortition scheme. During the very first epoch, a random draw is made to select five active participants out of the total eight. The three inactive participants are assigned a quality metric corresponding to the percentile P, which is about −0.05 at i = 0. From this point emerges one solid line: this participant is promoted immediately, again through random selection as the three inactive participants have an indistinguishable quality metric EMA. Additionally, one dotted line emerges from {0, −0.05}, which represents the two remaining participants who are initially inactive – these continue to overlap, because their quality metric EMA is built from the same percentile of the active participants. The bottom-most dotted line represents a participant who is active at i = 0, but underperforms greatly and is relegated immediately. It subsequently slowly recovers, but remains queued behind the other two inactive participants.

After i = 26, a participant who promoted after i = 0 is relegated, allowing the first of the two remaining originally- inactive participants (selected at random) to get promoted. After i = 27, the final originally-inactive participant gets promoted – this time on merit, because there are no more identical participants. The participant relegated after i = 27 only slightly underperformed, and therefore is the first to (briefly) be promoted again at i = 33. As a result, it takes till after i = 33 for the participant who was relegated at i = 0 to be finally promoted back into the active set A_{34}. This participant fails to prove its worth, is relegated again after one attempt, and gets another chance in the active set only after i = 45. Three out of eight participants are active throughout the 50 epochs shown here. Unsurprisingly, these are among the ones with the highest median quality metric, even though the second-best participant was inactive at i = 0 and thus had to wait for promotion before getting a first chance. After 1000 epochs, this participant has never relinquished its spot in the active set and has been active for 97.3% of all epochs.

To test this idea further, I’ve now set up an experiment where the initial number of participants is N_init = 100 and participants constantly join and leave, at rates approximately in equilibrium with one another. This results in a variable number of participants, ranging between 90 and 110.


In this figure, colours indicate participants in order of their first participation (light to dark). Columns correspond to the same panels as in the previous figure.

In the first panel, we see really nicely how clear the promotion-relegation threshold becomes for a large number of participants.

Due to the evolving nature of the participant pool, the relatively well-defined correlation between participant performance and activity (second panel) is erased once we allow growth and attrition of participants. In dynamically evolving participant pools, the fraction of the total number of epochs that a participant is active is strongly influenced by the fraction of time they participate. However, the fraction of a participant’s participation time interval during which it is active does still correlate with its performance.

The third panel shows that the added value of merit-based sortition increases towards larger N_tot/N_act (compare to the figure in my previous post). It can be understood as a size-of-sample effect: for a larger number of participants, the active set is restricted to a smaller tail end of the distribution, which implies that the mean instantaneous quality metric is higher than the average across the entire pool.

The main free parameter is the percentile P, and this requires careful calibration. I repeated the last experiment just above across a grid of percentiles P = 0−100% with steps of 2% to see where the performance is best. For each experiment I took the right-hand panel of the two previous figures and calculated the time-averaged quality metric of the active set. This needs to be maximized. I also calculated the z-score, i.e. the absolute difference between merit-based and random sortition, normalized by the standard deviation of the mean underlying, unsmoothed instantaneous quality metrics. This gives an indication of how statistically significant the improvement is.

The results look like this:


This shows a highly statistically significant (> 2σ) improvement with merit-based sortition relative to classical sortition for P = 20−40%, and a mild (> 1σ) improvement for P = 10−85%.

In summary: to get a mild improvement, P doesn’t really matter, as merit-based sortition is always better than random sortition. But the biggest improvement is found for P = 20−40%.

Let me summarize – I think we came up with a good solution.

What we have here is a simple mechanism for merit-based sortition, which employs an EMA of a quality metric to robustly predict the expected performance during the current epoch. The EMA of inactive participants is updated using a chosen percentile of instantaneous (unsmoothed) quality metrics among the active set. This mechanism allows the selection of active participants from a much larger sample based on the expected quality of their work. This merit-based sortition scheme has a variety of favorable properties:

  1. The active set of participants is permeable and enables participants from the inactive set to enter and prove their worth. This permeability allows the algorithm to appropriately handle quality changes among the participants.
  2. The fraction of the active set that is replaced and the frequency with which this occurs is controlled by a single parameter 0 < P [%] ≤ 100, which represents the percentile of the quality metric among the active set that is used to update the metric’s EMA of inactive participants. High P implies a high depth and rate of replacement.
  3. Participants who consistently perform above the (100% − P)-th percentile of the active set are not at risk of being replaced and remain active. The consistency is critical, as the volatility of their performance must be sufficiently small to avoid incidental, poor performances that risk relegation.
  4. Participants performing below the P-th percentile of the active set are at risk of being replaced.
  5. The probability of an inactive participant to be promoted into the active set increases with its prior EMA-smoothed quality metric, the volatility of instantaneous quality metrics in the active set, the EMA smoothing factor, and the percentile P.
  6. The time required by relegated participants to return to the active set increases for more poorly performing participants, because these are separated further from the minimum quality metric among the active participants.
  7. It is possible to define an optimal percentile at which the mean instantaneous quality metric across the active set is maximized. For the experiments performed here, this is found to fall in the range P = 20−40% at > 2σ statistical significance, but the precise optimum may vary based on the quantitative distribution function of the participants’ median quality metrics. P = 25% provides a good initial value.

These properties are favorable, because they demonstrate that participants are involved in an active set based on the quality of their work. The merit-based sortition scheme shares the property of classical, random sortition that it draws from a much larger pool of participants, of which each individual has a non-zero probability to become part of the active set. As a result, for an infinite number of epochs each participant is guaranteed to have been part of the active set at least once. However, the key difference is that the probability of inclusion in the active set is variable and merit-based.

Tagging @Renata @steve @nick for :eyes: