Losses in returns prediction topics

joel · June 2, 2025, 10:39am

I’ve noticed some strange behaviour of the losses in topics predicting log-returns. Some workers occasionally provide extremely large inferences (up to ~ 10^12) compared to typical returns values (<0.1), but still have reasonable values for their losses:

So at some point the losses flatten out, and inferences very far from the true value can have a lower loss than those close to the true value. That seems like unintended behaviour to me. Could it be an issue with the loss function?

Apollo11 · June 2, 2025, 2:34pm

Yeah we originally took the loss function for log-returns topics from the OpenGradient page here. It could indeed be related to the functional form of the (M)ZTAE loss function, because it flattens at large absolute deviations. It uses a tanh, which is symmetric and saturates at ±1.

Probably we don’t want it to saturate, but exhibit continued but shallower power law behaviour. The knee should then be the same sigma where it is now, and the power law slope should be a parameter we control. But all of this depends on whether this hypothesis is correct. What would the above figure look like for a standard MSE loss?

joel · June 3, 2025, 2:31am

I see. I made a quick figure of the ZTAE loss function for different true values with mean=0 and standard deviation=1. So the function becomes more asymmetric as the true value increases from the mean. But because it flattens off “infinite” values in the correct direction (relative to the mean) can receive quite low losses, which is what we’re seeing above.

This is how losses look for a topic with an MSE loss function. The scatter around the expected loss function is I think related to uncertainty on how the ground truth is defined (see here), plus differences between data providers. So the “true” values I’ve used (from Tiingo, rounded to the nearest minute) might be slightly different from what the reputers used, which is why the scatter increases as the difference from ground truth decreases. Still, inferences with much larger differences than the ground truth uncertainty should largely be unaffected, and indeed they have the largest losses.

Apollo11 · June 3, 2025, 8:15am

Oh wow, that’s a great visualisation: so if x_true = 1, then x = +infinity receives a better loss than x < 0.5 in this case. That’s the somewhat ridiculous consequence of the tanh saturation.

Sounds like a PL modification of the tanh could do wonders. For instance, if we use this:

def smooth_power_tanh_general(x, alpha=0.5, beta=2.0, x0=1.0):
    y = x / (1 + (x / x0)**beta)**((1 - alpha) / 2)
    return y

xtest = np.linspace(-5, 5, 1000)
plt.plot(xtest, np.tanh(xtest), ':k', lw=2, label='tanh')
plt.ylim((-5,5))
for alpha_test in np.array([0.25, 0.5, 1]):
    ytest = smooth_power_tanh_general(xtest, alpha=alpha_test)
    plt.plot(xtest, ytest, label='alpha = '+str(alpha_test))
plt.legend()

then the function looks like:

which gets rid of the saturation.

Maybe less important, but then we can also change the transition point (would keep this at x0 = 1 tbh), here shown adopting alpha = 0.5:

I guess something like this could be better? Then we’d just need to decide on the value of alpha.

joel · June 4, 2025, 2:05am

Nice, that could be a really simple solution!

Here are some tests replacing tanh in the ZTAE loss function with a power-tanh function.
Regarding the best value for alpha, it seems there’s a trade-off between larger loss values for inferences far from the ground truth (higher alpha) and keeping the asymmetric behaviour of the tanh function (lower alpha).

Another way is to look at a version of the above figure that’s ‘folded’ about the true value, so for each function the lower line is in the same direction as the true value.
Based on this I think alpha=0.1 is too close to the initial tanh function, so will still have much of the same issues because it is still quite flat past the ‘knee’ of the function. alpha=0.5 might be too close to a power law function, there’s not much difference in the positive and negative directions so may not sufficiently reward predictions in the right direction.
That leaves alpha ~= 0.25 as a nice middle ground?

If we apply this power-tanh loss function to the initial returns topic data we get the following median losses, which I think is a good improvement to the current tanh function!

Apollo11 · June 4, 2025, 7:39am

Perfect! We can keep alpha (and plausibly some other moving parts) as free parameters, set alpha = 0.25 as default, and keep a close eye on how this improves the network inference. Glad we got this sorted so quickly!

Apollo11 · June 4, 2025, 7:59am

So, for future reference, then the final form of the ZPTAE loss (inserting P for power law) is:

def smooth_power_tanh_general(x, alpha=0.25, beta=2.0, x0=1.0):
    y = x / (1 + (x / x0)**beta)**((1 - alpha) / 2)
    return y

joel · July 16, 2025, 2:25am

Looking into the ZPTAE loss function again, we decided to slightly modify it by adding a “penalty” term for outliers to the loss function. This leaves the main functionality unchanged for reasonable inferences, but further penalises extremely large outliers (obviously unrealistic values). The aim is to make outliers more obvious in losses/regrets which should help with inference synthesis and allow the forecasters to better take outliers into account.

def power_tanh(x, alpha=0.25, beta=2):
    return x / (1 + np.abs(x)**beta)**((1 - alpha) / beta)

def loss_zptae(y_true, y_pred, sigma, mean, alpha=0.25, beta=2, gamma=4, penalty_norm=0.01):
    # Z power-tanh absolute error

    z_true = (y_true - mean) / sigma
    z_pred = (y_pred - mean) / sigma

    pt_true = power_tanh(z_true, alpha=alpha, beta=beta)
    pt_pred = power_tanh(z_pred, alpha=alpha, beta=beta)
    
    main_term = np.abs(pt_pred - pt_true)
    penalty_term = (penalty_norm * np.abs(z_pred - z_true))**gamma

    return main_term + penalty_term

Visualisation of the ZPTAE loss function with (solid lines) and without (dotted lines) a penalty term.

Topic		Replies	Views
Systematic survey of network loss vs pnorm Inference Synthesis	17	147	September 18, 2024
Using epsilon to determine topic numerical precision Inference Synthesis	16	79	November 9, 2024
Outlier-resistant network inferences Inference Synthesis	10	100	December 19, 2024
Thorough testing of new forecaster model Applications	9	97	May 29, 2025
Inference confidence intervals Inference Synthesis	21	173	July 16, 2024

Losses in returns prediction topics

Related topics