As detailed in previous discussions about losses in returns prediction topics, we identified issues with the ZTAE (Z-score Tanh Absolute Error) loss function where losses saturate at extreme inference values. The ZPTAE loss (Z-score Power-Tanh Absolute Error) was introduced to address this by replacing the hyperbolic tangent with a modified version that transitions to a power law at large values, preventing saturation for outliers. However, further testing revealed additional considerations for inference synthesis that motivated the development of CZAR loss.
In returns prediction topics, the network often puts too much weight in constant predictions with values near the mean (similarly in machine learning model training). The most obvious illustration of this is the fact that predicting zero returns is technically better than drawing predictions from the same PDF as the ground truth (a direct consequence of increasing the dimensionality of the problem):
Another way to visualise this is by comparing the integrated expected log loss (since inference synthesis uses log loss) for a Gaussian “true” returns distribution as a function of constant predicted returns values for various loss functions. Common loss functions (e.g. MSE, MAE, etc) have a clear minimum in the expected log loss for predicted returns=0 (solids lines). Circle points indicate the integrated loss for a model that randomly draws predictions from the same PDF as the true returns distribution. In general, most loss functions clearly favour predicting zero over drawing from the same PDF.
ZTAE/ZPTAE are better loss functions in this regard because they are designed to better reward close predictions when the true value is far from the mean, and indeed, their expected log loss is flatter near zero. This is due to the way the asymmetry of the loss functions shift depending on the true return value. However both loss functions have issues:
- ZTAE flattens for large values, meaning extreme outliers can receive relatively low losses if they are in the right direction.
- ZPTAE was created to address the issue with ZTAE, but it (and ZTAE) has the problem that it is largely a concave function (hessian < 0) so cannot be used for model training.
The zero-returns issue can be somewhat alleviated by adding a constant ‘smoothing’ term to the loss function (dashed lines in the above figure). It increases the integrated log loss most at predicted returns = 0, but doesn’t sufficiently flatten the integrated expected losses to disfavour predicting the mean.
We can take this idea further by introducing smoothing that scales inversely with the absolute true returns value, i.e. the smoothing is maximal at the mean true value and decreases as the absolute value of the true return increases. Applied to the ZPTAE loss function shows how this results in a non-zero floor in the loss for true returns near zero:
Adding this to the integrated log loss test (zptae_scaled_smooth), we see it is so effective there is now a local peak at returns=0, and is otherwise very flat between +/- 1. So with adaptive smoothing we can level the playing field between a zero returns model and a ‘same PDF’ model (dots) in the network.
To summarise, we want a loss function for returns topics that:
-
Is asymmetric and rewards predictions when the true value is far from the mean (ZTAE/ZPTAE-like)
-
Trends to ~infinity for large differences in predicted and true values to adequately handle large outliers
-
Has adaptive softening to down-weight constant returns ~= 0 models
-
Is convex, so can be used for training models
We tested versions of asymmetric linear and quadratic functions (by modify the gradients for predictions on opposite sides of the true value) and a sigmoid gradient function (that shifts vertically depending on the true value), both with adaptive smoothing, but found they did not outperform the ZPTAE function. We attributed this to the steepness of the loss function about the true value (i.e. the loss functions were too wide), so to the above points we can add:
- Has a sharp change in the gradient function about the true value (ZTAE/ZPTAE-like)




