Designing a Machine Learning model for forex prediction

Discussion in 'Automated Trading' started by mmutoo, Dec 24, 2023.

  1. destriero

    destriero


    Multiple feeds when everyone has been employing triangular arbitrage for decades and are all using EBS as a reference feed, you tw*t.
     
    #11     Dec 26, 2023
  2. destriero

    destriero

    Go with hourly or preferably 4H.
     
    #12     Dec 26, 2023
  3. maciejz

    maciejz

    Your model is WAY too complex. 4 layers with 50 neurons each is over 10K parameters. How large is your dataset? You’d want to have thousands of data points for each parameter due to large noise levels. So, unless you have over 10 million rows of data, your model will produce pretty random results out of sample.

    Making the model too complex is a pretty common rookie mistake in ML. The temptation is to create the “greatest” model ever which turns into the most complex model. But, there is something called the bias/variance trade off. It is one of the fundamentals of ML. Quite frankly, getting the model complexity right is quite challenging and is one of the reasons why ML is sometimes referred to as an art.

    There are lots of books to help get you started with ML, but one of the best ones focused on these foundational concepts is “Learning from Data: a short course” by Yaser Abu-Mostafa, et all.

    Applying ML to trading is non-trivial. If it were easy, everyone would be doing it :) One of the main things it requires is knowing ML very well. I don’t mean knowing how to use the available libraries, that’s not knowing ML :) Even knowing enough ML to implement the algorithms (NN or gradient boosted trees) yourself is not enough, although it is a necessary step. You really need to know what is going on underneath the covers; because what’s going on there is not magic :) If you don’t, then you’re just throwing shit against the wall. And, there are plenty of people who do that, but their results out of sample look nothing like what they expect.

    My other piece of advice would be to forget neural networks and deep learning. Yes, it is very powerful and all that, but it is not the most efficient approach to the trading problem, which is a tabular data problem. Gradient boosted trees are pretty much the state of the art for tabular data problems. This doesn’t mean that you can throw XGBoost or LightGBM at your data and have a tradeable model. It’s just that most people have an easier time understanding what’s going on in a tree model than understanding what’s going on in a neural network; also gradient boosted trees should be way faster to train than NNs. And, if you understand what’s happening underneath the covers, then you’ll understand that both approaches should produce very similar models if you have things tuned properly.

    Anyhow, good luck on your journey, it’s a fascinating one.
     
    #13     Feb 21, 2024
  4. mmutoo

    mmutoo

    @maciejz Thank you for the detailed reply. I am actually not completely new to the ML field. I have read the book you mentioned. Regarding the number of samples, I have about 500K, and I monitored the learning process to make sure the model is not overfitting. I also tested XGBoost and Random Forest which gave me a similar result. I even tried an ensemble method. Anyways, I will look into what you mentioned and dive deeper.
     
    #14     Feb 22, 2024
  5. mmutoo

    mmutoo

    Thanks. Will try that one too.
     
    #15     Feb 22, 2024
  6. maciejz

    maciejz

    How deep were your trees in XGBoost? Also, I know that some implementations don’t respect a minimum split size, which is going to produce more random results.
     
    #16     Feb 22, 2024
  7. mmutoo

    mmutoo

    I don't remember that since it was long ago. I will check. Is there a rough number for that amount of data?
     
    #17     Feb 26, 2024
  8. maciejz

    maciejz

    In terms of minimum size, keep in mind what models like XGBoost are doing underneath. On each iteration, they are building a decision tree on your data. So, if you are using a max depth of 1 (for illustration here), which means one split on a single feature (pretty much the simplest model possible) the algo makes a split point and calculates the y_hat on the left and right side of the split point by taking a mean of the y values in each of those regions. The "optimal" split point is the one that will minimize your error, probably MSE (it's a good idea to use MSE even for trading systems).

    Now, the reason all of the above is related to your question about the minimum split size, is because y_hat for a split is the mean of the y points in that split. That is literally the definition of a sample mean. And we know that the standard error (SE) of the mean (so, the SE of y_hat) is inversely proportional to the square root of the sample size. Well, the sample size is your split size -- it is the number of data points on the left (or right) side of your split point. So, if you increase your minimum split size by a factor of 4, you decrease the standard error of your y_hat by a factor of 2.

    Reducing the SE of your y_hat is going to reduce the variance of your model, and typically you want that. Optimizing the bias/variance trade-off can't be done in-sample. So, to truly determine the optimal model complexity, including the max depth and minimum split size, requires cross-validation. You'd have to try different hyper-parameter configurations and see which performs best on validation. Keep in mind that this burns your validation data and thus you can't use your validation performance as an estimate of future out-of-sample performance. You'd have to have yet another clean data set that you'd use to estimate your expected OOS performance.

    From my experience with daily frequency models, I'd say that you want at the very very least 1K data points in each split; but the more the better, typically. This will also affect your model complexity. If you have 500K data points, and you want a min of 1K in each split and have about 10 potential splits per dimension, you could get to a depth 6 model. But, that's way overkill IMHO. I'd stick to a depth 3 or lower, and increase min split size to 10K.

    One more thing, XGBoost, at least when I tried it last, did not respect the minimum split size configuration. This is one of the reasons why XGBoost will often not provide very good results on data with low signal to noise, like pretty much all financial data.

    Hope that helps. Let me know what you find.
     
    #18     Feb 26, 2024
    nikosdim likes this.