Excessive curve fitting is anathema to any system developer, however, any system in some respects is dependent on a certain set of parameters. How many bars is the lookback period? How much % is the retracement? What is the profit target? Personally, I try to use adaptive parameters, like a lookback period that oscillates between a maximum and a minimum depending on market choppiness, targets dependent on volatility, etc., however also in my case ultimately I need to input a few numeric (arbitrary) parameters. What would be philosophically the best way to proceed? On one hand, I could backtest across a few different periods and choose the parameters that give the overall better performance. On the other hand, one could argue that the market of 3 years ago is not the market of this year, and one should calibrate parameters according only to the best performance in the most recent period. Someone else could argue that it's not so much about the period choice but about the robustness across different markets. I don't have a strong opinion on what is best, would just like to understand other people's reasoning and motivations to choose one approach over another.
Changing the parameters, data, time periods, etc. that you are attempting to fit a curve to won't impact your overfitting problem. I've only found one solution to this problem and that is to understand why the algorithm you have created does what it does. It's not enough to say that buying IBM on days when it rained in Greenville, SC produced a 76% annual return over the last 5 years. If you don't understand why that happened then it's probably just a case of over fitting. It's better to start with a hypothesis. Here's a fictitious example: I noticed that when Calavo Growers, Inc begins harvesting their avocados wheat futures start moving up over the next two weeks. That may not be a trading strategy but it's my suspicion that there's a connection. It's possible the avocado farmer in Florida is keenly aware of some aspect of the weather that correlates with wheat up north but he has no idea. He's just learned how to optimize his avocado output. So I'll check the data to see if this is correct and boom! There is indeed a long running correlation that's pretty good. I can't explain it precisely but it's apparent the avocado farmer and his weather in Florida tell us something about the weather for the wheat farm two thousand miles away. This is now a sound trading strategy. Here is an example that will only create over fitting and no value: I decide to compute correlations between green days and red days on all stocks over the last year. Company XYZ is the one most likely to have a green day if the prior day was red. In fact you'd have made a 143% return with this strategy over the last year! But it turns out you can achieve the same outcome with random data. All we did was track down an outlier. It's not predictive of the future. No amount of changing time periods, quantizing parameters differently, etc. is going to make this work.
Divide the data into in- and out-of-sample parts. For example, 70% in-sample, 30% out-of-sample. Create rules with ideas that make sense to you using the in-sample data. Test them on the out-of-sample data. If they work well, they might be a candidate for forward testing on completely new data. If not, the rules likely won't work in the future.
Hello ph1l, What happens when you are done with testing (curve fitting) in-sample and you test the algo in out of sample, and the results are not how you like it? Do you go back to in-sample and try to curve fit some more and try again?
Yes, I might try to fit again on the same in-sample data. But if that's done too many times on the same data with the same general trading idea, the out-of-sample data is effectively also in-sample. So the system still might not work after it looks good on in-sample and out-of-sample data. Here is an article that discusses this issue. https://quantdare.com/deflated-sharpe-ratio-how-to-avoid-been-fooled-by-randomness/ It also has suggestions on how to avoid the problem.
If you keep using your "out of sample" data over and over then it's not out of sample data. That's why this doesn't work.
No, I don't model futures. But I do model ETFs which would likely be similar. For example, this is a chart of QQQ with fitted curves for high and low prices using the past 4 months of daily prices as inputs: The curves have equations Code: y_hi = 366.871154785156 + -0.411301225423813 * x + 15.3298168182373 * skewed_cos(twopi / 91.9550263045434, 1.28351449966431, 0.236026108264923, x, 100); y_lo = 356.874694824219 + -0.399159908294678 * x + 16.5079669952393 * skewed_cos(twopi / 95.0921408381499, 1.53554952144623, 0.0404615998268127, x, 100); where y_hi and y_lo are predicted high and low prices of QQQ x is the number of calendar days past the start of a sampling period (input prices are adjusted for splits and dividends and interpolated for days the market was closed) skewed_cos(freq, phase, skew, x, iter) = cos(freq * x + phase) when iter == 0 skewed_cos(freq, phase, skew, x, iter) = cos(freq * x + phase + skew * skewed_cos(freq, phase, skew, x, iter - 1)) when iter > 0 Each curve has a straight line (trend) added to a cosine wave that's left- or right-translated (left-translated in this example). The equations for curves fitting different assets are different, of course, and the equations would need to be recalculated each day to be useful. Other than the price data, the only input for the curves is the number of calendar days in the input period. So my basic idea is buy after the skewed cosine wave is at a trough and not overwhelmed by the linear trend, and sell when predicted skewed cosine wave is at a peak. Backtesting shows this is profitable over various ETFs with 18-22 years of daily prices. To make that work better, I create indicators based on the equations related to how good the fits were, the current phase of the fitted curve, and how the curves for the high and low prices relate to each other. Then I combine and optimize these indicators on 70% of the 18-22 years of ETF input data and create additional filtering rules like: Code: $R1 = $R0 = 0; if ( $cyFitProp_lo >= 0.22716 ) { if ( $cyFitProp_lo <= 0.732629 ) { $R1 = 1 + $R1 ; } } $R0 = 1 + $R0 ; if ( $cyFitProp_hi >= 0.29458 ) { if ( $cyFitProp_hi <= 0.904422 ) { $R1 = 1 + $R1 ; } } $R0 = 1 + $R0 ; if ( $cphase0_lo >= 2.8355 ) { if ( $cphase0_lo <= 3.95609 ) { $R1 = 1 + $R1 ; } } $R0 = 1 + $R0 ; if ( $cphase0_hi >= 2.80795 ) { if ( $cphase0_hi <= 3.76181 ) { $R1 = 1 + $R1 ; } } $R0 = 1 + $R0 ; if ( $rlh_per0 >= 0.316866 ) { if ( $rlh_per0 <= 1.02584 ) { $R1 = 1 + $R1 ; } } $R0 = 1 + $R0 ; if ( $rlh_phase0 >= 0.000702903 ) { if ( $rlh_phase0 <= 6.28273 ) { $R1 = 1 + $R1 ; } } $R0 = 1 + $R0 ; if ( $detrPropNextBar_lo >= 2.92023e-09 ) { if ( $detrPropNextBar_lo <= 0.32465 ) { $R1 = 1 + $R1 ; } } $R0 = 1 + $R0 ; if ( $detrPropNextBar_hi >= 8.10645e-06 ) { if ( $detrPropNextBar_hi <= 0.635021 ) { $R1 = 1 + $R1 ; } } $R0 = 1 + $R0 ; if ( $R1 >= $R0 ) { $return = 1 ; } where the floating point values were genetically-optimized on the in-sample data. The same rules with the same parameter values would apply to all the ETFs. Then, I test the basic idea plus combined rules on the out-of-sample data. When these rules on the in-sample and out-of-sample data show similar results, the rules can be forward tested and/or tested with real money. I don't think this overfits the data because The rules have similar performance on in-sample and out-of-sample data. The input is a fairly long historical period that covers different types of market action. The same rules with the same parameters get applied to different asset classes, sectors, and regions of ETFs. Sometimes, it even works.
@ph1l Using in-sample and out-of-sample data is the classic textbook approach, however, I always wondered if it was really the most appropriate one. Let's start from the assumption that every system works best only during a certain time window. The market evolves with time, opportunities come and go, and market participants and structure change. Outside of those boundaries, performance inevitably degrades. Maybe you have a good system that could make money now, but what if your out-of-sample is exactly a period of poor performance? Wouldn't it make more sense to train your parameters only on the most recent time frames? I also use an in-sample and out-of-sample method, but in my case, I use a population of 50 stocks (picked so that there is a variety of small caps, mid caps, big caps, growth, value, etc.) to run a standard backtest, when I find something promising I backtest on a bigger basket of 500 stocks. The time frame to choose, however, up to this day is something I'm really unsure about. Curious to hear everyone's thoughts about it.