For the most part, the Allora network architecture is general enough to accommodate anomaly detection problems without major changes. Similarly to classification tasks, inferences need to be vector-valued, where the i-th component represents the probability that the i-th element in a given list of data points is anomalous in the epoch.
The two main modifications required beyond the classification extension are:
1. Data Submission by Participants (Arguments):
Consumers (or potentially any participant) must be able to supply data to the network, which will then classify each point as anomalous or not. This requires an additional step before the “inference submission window,” where each interested consumer submits a list of data points. While the data type of these points should be fixed in the topic definition, the network itself does not interpret or process them beyond passing them on to participants. This flexibility allows for a wide range of anomaly detection applications.
2. Loss Functions Without a Fixed Ground Truth:
Many anomaly detection tasks do have a well-defined ground truth, but it is often unavailable in a timely manner. Other tasks may be inherently subjective or loosely defined, making it difficult to establish an agreed-upon ground truth at all. As a result, we need an alternative approach to assigning losses (and thus scores and regrets) to inferences.
One straightforward method is to have reputers collectively determine a ground truth and incentivize them based on consensus. This is what regression type topics already do, so it would require no modification to the network. However, consensus-based approaches can struggle when participants have too much freedom in their assessments—similar to how optimization algorithms (e.g., gradient descent) perform better when initialized near an optimal solution.
Alternatively, we could eliminate reputers entirely and allow topics to define an objective loss function that directly evaluates inferences. In this model, inference workers still provide value, as identifying the minimizer of a loss function can be computationally challenging—this is even more true in clustering tasks, which share similarities with anomaly detection. However, for most topics, this approach is highly susceptible to overfitting: designing a loss function that aligns well with the intended goal is a significant challenge.
We propose a hybrid approach that balances both methods:
- The topic defines an objective loss function with one or more adjustable parameters.
- Reputers engage in a consensus game to determine the optimal values for these parameters in each epoch.
This approach reduces the degrees of freedom in the consensus process, improving convergence while still allowing for adaptability. Compared to a fixed loss function, it introduces some level of uncertainty for inference workers, discouraging overfitting to a static loss metric.