before you post this, I am searching for deflated Sharpe ratio... One of the graph is to plot the relationship between overfitting and how much data you use.
No trials is not backtests. It's how many times you fitted the data. Basically the more fits you do, the greater the chance of finding a spurious result, the more data you need. So unlike data points, you want fewer trials. And this particular plot would be different, depending on the underlying Sharpe Ratio. Also it definitely isn't the case that having more frequent data reduces the history of data required, since the noise and the parameter variability both scale with the square root of time. If you need 20 years of daily data for statistical significance, you will also need about 20 years of one minute data * [the exception clearly is if you have data that is 'too slow' for your trading system; of course an HFT would benefit from having tick data rather than daily data] GAT * [Technical note] very slightly less because the T-distribution is converging on a normal distribution so the critical value of T falls a tiny amount from 2500 observations to several million.
%% True. BUT i still like all the data; going back to 1927-37 + 1776...................................................................]Edit=all the data on larger timeframes]
See https://gmarti.gitlab.io/qfin/2018/05/30/deflated-sharpe-ratio.html and https://poseidon01.ssrn.com/deliver...80070108123085072006119065&EXT=pdf&INDEX=TRUE "THE DEFLATED SHARPE RATIO: CORRECTING FOR SELECTION BIAS, BACKTEST OVERFITTING AND NON-NORMALITY" David H. Bailey, Marcos López de Prado, July 31, 2014
just forget about the graph. The reason why Marcos Lopez form a false strategy theorem is that he wants to raise the awareness of multiple backtest Overfitting. There are many ways to calculate the probability of Overfitting, such as deflated sharpe ratio, family wise error rate......,etc. however, they don’t define what trial is. Some backtesters sometimes just do a small change on the code while others do a big change. Under this situation, how can they have a same Overfitting probability? that’s why deflated sharpe ratio is useless under some scenarios. But yes. you should be careful of multiple testing. More importantly, your strategy needs to be explainable, don’t use too many variables and run LASSO to reduce Overfitting.
it also depends on the distribution of your trades. If your trades tend to move to particular time frame, for example you backrest 2000-2020 and more than 1.5k trades (>75%) are completed in 2008 and 2020, your ML may not learn the general data structure.
I think more important than time, is the data you are using for your backtesting. You should take into account different market conditions that can simulate your trading plans.