A follow-up on the post “Determine guidelines for choosing alpha_regret”
Imagine worker gets much better or much worse. We don’t want to overreact if that is incidental, but we want to catch it early if it is systematic.
A follow-up on the post “Determine guidelines for choosing alpha_regret”
Imagine worker gets much better or much worse. We don’t want to overreact if that is incidental, but we want to catch it early if it is systematic.
To summarize, our goal is to set the parameter α_regret for each worker individually, letting α_regret evolve over time based on the worker’s reliability. Initially, we might start with α_regret being set to 0.1, adjusting it as needed to potentially reach equilibrium.
Background & intuition:
α_regret is a parameter in the Exponential Moving Average (EMA) that is inversely proportional to a time constant τ, which is a period it takes for the EMA to reflect approximately 63.2% of a step change in the input data (mathematically, it’s a random variable). We will refer to τ as the window size. It is logical for τ to depend on systematic fluctuations in a worker’s regret. For instance, in scenarios where there is gradual improvement from a cold start, maintaining a small α_regret would result in the network retaining the cold start effects for an extended duration. Therefore, we’d expect α_regret to increase with improvements in worker performance.
To adjust α_regret, let’s consider a measure of volatility of the current regrets. If it is high, we should opt for a smaller α_regret to avoid overemphasizing temporary outperformance. Conversely, if a worker has consistently shown improvement after an initial rough start, we should increase α_regret to give
more weight to their recent performance.
Computational Specifics:
Given the impracticality of storing regrets data from all previous epochs, we adopt an Exponential Weighted Moving Average (EWMA) approach. We define the volatility for each epoch i as follows:
volatility_i = [λ * r_i^p_regret + (1 - λ) * volatility_(i-1))^p_regret]^(1/p_regret),
where r_i = ([current_regret at epoch i-1] - [current_regret at epoch i])_+ = max ([current_regret at epoch i-1] - [current_regret at epoch i], 0)*. This formulation exclusively captures the reduction in current regret over a single epoch, as an increase in regret is advantageous and should not be subject to penalization. Indeed, greater emphasis should be placed on workers whose predictions consistently improve over time, rather than on those whose performance remains static.
*Also note that we refer to the log loss difference in equation (15) in the Whitepaper as “current_regret”. That is so that we (a) capture the most recent change in worker performance, unaffected by historical data, and (b) avoid influencing the calculated value of volatility_i, since it will influence the value of α_regret, that is used in the calculation in equation (15) to determine \cal R_il.
To proceed, we need to establish optimal values for the parameters λ and p_regret. I propose initiating our experiments with p_regret set to 2, as it represents a natural starting point. The choice of λ depends on the data frequency and desired responsiveness. After selecting a reasonable value for λ, our next step will be to define a functional form for the dependency of α_regret on volatility. Right now our only insight is that it should be a decreasing function with values ranging between 0 and 1.
Focusing on λ for now, our initial simulator setup tracks the volatility and r_i over the last 100 epochs, labeled as “Deviation” and “Returns” respectively on the plots. The plots below display these metrics for each worker across various λ values, using a log-scale to enhance readability. However, we intend to revise the simulator design to more accurately model sudden changes in workers’ behavior. Based on these observations, a λ value between 0.1 and 0.3 seems to be a reasonable choice.
After testing various potential functional forms for α_regret, we’ve identified an issue: α_regret can only be sensitive to the changes of volatility within a certain range. But given the uncertainty in the expected volatility ranges, we’re now trying out a new approach to handle systematic changes in network participant behavior. Instead of setting a fixed range, we are adapting the function α_regret(volatility) itself.
Controlling the ranges is important because if α_regret values become too extreme, they can skew how the network evaluates different workers, impacting the network performance. To address this, we are developing a dynamic form of α_regret that adjusts according to the typical volatility range of the workers. Specifically, one needs to define a transformation that takes a volatility value and outputs an appropriate α_regret value. Initially, we start by using a simple piece-wise linear (see the fig. below), which we will refer to as transformation ‘g’.
We’re looking at regrets = log loss differences, hence, we’d expect similar changes in regret (on the same scale) for the same type of worker, whether they are inferers or forecasters. Initially, I thought about assigning each worker a typical range, but then this could make α_regret too sensitive to individual volatility spikes, which is unnecessary when, say, everyone is having a bad day. This led me to consider using EMA for α_regret as well, to smooth out these fluctuations and avoid overreactions.
Another argument in favor of using EMA for α_regret: for small networks range based solely on current volatility can lead to extreme and noisy α_regret values. So maybe we should include volatilities from the previous epoch: that would broaden the data set and smoothen the ranges. However, if the current volatilities are higher/lower than the previous volatilities, this would result in new α_regret values being lower/higher, respectively, than if the range was determined based solely on the current volatilities. So again, EMA approach would be useful to smooth out /mitigate this.
We’ve also updated the simulator setup to better model the fluctuations we’re trying to address. Specifically, we added “good/bad days” for each worker to simulate more realistic behavior.
Dynamic volatility
is updated through a p_normalized EMAα_regret
is also updated using EMA, but in the log-space.
g
is a function of Dynamic volatility
, it is piecewise linear: it is linearly decreasing, from a specified upper limit on α_regret
to a lower limit, within the volatility range
. It is constant outside of this range.volatility range
is defined by min/max volatility values at epoch for each type of worker. I.e., it is calculated separately for inferers and forecasters.We have adopted upper and lower bounds on alpha_regret based on the experiments for non-dynamic alpha_regret. We then vary parameters that define how volatility is computed, and compare the network’s prediction losses by averaging across different simulator runs. Additionally, we include a datapoint for a default setup with non-dynamic alpha_regret to serve as a baseline for comparison.
Conclusion:
There is no statistically significant benefit in using adaptive alpha regret. I suspect this stems form the fact that there is no difference in the network performance when varying alpha_regret within the range 0.1, 0.01 and 0.001 in the experiments above. Hence, varying dynamic alpha_regret within that range is not going to do much either.
We conclude that introducing dynamic alpha_regret in the current setting is redundant. Specifically, the performance is not affected when non-dynamic alpha_regret is varied on a large scale, hence, dynamically varying alpha_regret in the same range seems futile. At this point we don’t know what sort of behavior to expect from the network participants in the live system. Therefore, our recommendation is to monitor the live network for a while, and, if needed, we can explore further any possible benefit of using adaptive alpha_regret. A simple diagnostic would be to test the network performance using multiple values of alpha_regret. Significant differences might indicate the need for an adaptive alpha_regret that is sensitive to historical volatility.
Thank you for the careful analysis. I think your conclusion makes a lot of sense. Further network monitoring will help us assess under which circumstances an adaptive alpha_regret
might boost network performance.