Monitoring topic and validator reward health

I’m computing the gini coefficient as follows:

def gini_coefficient(x):

  mad = np.abs(np.subtract.outer(x, x)).mean()
  
  rmad = mad / np.mean(x)
  
  g = 0.5 * rmad
  
  return g

I am applying gini to the reward fractions for topics and validators, i.e., looking at the rewards given to all topics and normalise by the total reward given to topics at that epoch. From wikipedia, a gini coefficient of 0 corresponds to perfect equality and 1 represents the situation where one individual gets all of the income. In general I started by looking at it on simulated data.

This doesn’t provide much insight. So instead I simulate the network for 1000 epochs and store the reward fractions over time. For each reward fraction, or pmf, I compute the gini coefficient, the number of topics or validators, the minimum entry of the reward fractions, and the maximum of the reward fractions.

In these simulations, if we vary the number of topics or validators and calculate the gini coeffecient:


We see gini is proportional to N. I don’t think that’s necessarily good for our metric. A gini of 1 is unhealthy, in general more topics/validators is not unhealthy.

For the following scatter plots, I am computing the standard deviation of the log of the reward fractions, and the log of the ratio of the maximum to minimum reward fraction as well as our metric:


We see the ~N behavior in these plots as well.

I played around with several methods to try to normalise out this dependency on the number of topics or validators - to no avail. After some offline discussions with the team I have decided to look into entropy:

def entropy(pmf):

  return -np.sum(pmf * np.log(pmf))



It has the same issues and is ~N. To mitigate this, we can normalise by ln(length). This normalised entropy is bounded on [0,1]. Here are the results:

This is better, much less relation to N and it slightly decreases as N increases. This comes with the drawback of being much more compressed (all values within 0.5) of each other.

All simulations above were done by generating reward fractions using the network simulator. But “we expect power law distributions of wealth and thus topic or validator weight.” So I will perform MC simulations with the PL.
I generated reward fractions from a PL with:

def generate_power_law(alpha, x_min, x_max, n_topics):

  # Generate n values from a power-law distribution using inverse transform sampling
  
  r = np.random.uniform(0, 1, n_topics)
  
  x = (x_min**(1-alpha) + r * (x_max**(1-alpha) - x_min**(1-alpha)))**(1/(1-alpha))
  
  return x/sum(x)  # Return with normalization

With the parameters

x_min = 1

log_xmax_xmin_range = np.arange(0.5, 3.5, 0.5)

alphas = [-1.01, -1.51, -2.01, -2.51, -3]

For each combination of parameters, I generate 1,000 samples. The results are below, here I’m colouring by alpha and log(x_max/x_min)




To me this is behaving nicely. The normalised entropy clearly decreases as the max/min ratio and stdev increase.

Now how do develop a decision rule? With binary classification. Say we want to detect if the log max/min ratio is bigger than 1 (the max reward fraction is 10 times larger than the minimum). Our decision rule is an entropy value, I.e., if the entropy is smaller than X then I conclude the log(max/min) ratio. For a given decision rule we can compute typical classification metrics, here I am optimising based on the F1 score which depends on the precision and recall. So here I sweep over a range of decision boundaries and compute their f1 scores to find the best decision boundary.

Result for detecting if the log max/min ratio is larger than 1:

n_topics = 11:



n_topics = 500:

Above, the resulting decision rules vary significantly depending on N.

We can also develop a decision rule based on stdev log(fractions), maybe that is less sensitive to N.

Results for detecting if the stdev is larger than 0.5:

n_topics = 11:


n_topics = 500:

This is much less sensitive then the decision rule based on max/min ratio.

Is it possible to determine a stdev of log fractions we want to detect? Maybe I need to explore normalisation more, maybe performance would be better if our x axis criteria was normalised based on n as well…? I can also perform the classification training process on data from different number of topics.