Price/returns topic feature engineering

Thanks all for voting in the poll! Looks like we have a clear list of three priorities, so let’s get working on these:

  • Including force and energy features (i.e. multi-timeframe Δ[close-open]/Δt, [close-open]**2, difference from MA, multi-timeframe linear gradients)
  • Add returns-focused feature set (all quantities you can calculate for price, but for log-returns)
  • Modifying the training evaluation metric to match the ZPTAE loss function

The way we should go about these is to perform an A/B test, i.e.:

  • use your own default model;
  • record its performance across a set of (sufficiently long) time intervals (more than one to achieve statistical significance);
  • develop one of the above modifications;
  • add this to your own model;
  • record the performance of the modified model across the same set of (sufficiently long) time intervals;
  • quantify any differences and compare statistical significance.

We can collectively define some of the unknowns in the above plan (e.g. which time intervals, how long, which metrics) and I suggest you just propose what you’d like to use.

It’s great that we have three model builders involved in the discussion already (@t-hossein @phamhung3589 @its_theday). Given that each of your models is quite different (and uses a different feature set), can I maybe suggest that we work through the above ideas simultaneously? So then we pick one, all do the A/B test for that, and compare results. That way, we also test the robustness of these ideas under differing modelling approaches and I think that could be very useful. Given that we’re looking at historical data for these tests, we can continue to use the PAXG target, but if any of you would like to switch to the target of one of the new Forge topics (e.g. BTC), please let us know. Of course, more model builders are welcome to join at any time!

I then would like to suggest we start with Add returns-focused feature set (all quantities you can calculate for price, but for log-returns). My reasoning is that this is a relatively small amount of work to try (applying the transformations you are already using to another variable). Just be sure they’re sensible in this context – a log-return is a two-point quantity (expressing a difference between two moments), whereas a price is a one-point quantity (exists at any given moment in time). For instance, it makes sense to apply moving averages, RSI, MACD, Bollinger (and many other TA indicators) to log-returns, but maybe some other indicators relying on e.g. volume information or open-close data do not.

If you think this is a good plan and you’ll participate, just like this post and let’s get going!

2 Likes