My basket of ETFs is very diverse. Basically it is all the big leveraged ETFs. Different countries, sectors, commodities, inverse...
Logic packed in rules. So no number crunching with optimization (and hope it will work in futur). Or blackbox magic.
I agree completely, You start with an idea that makes sense and then look in the market, in the past if it would make money. But 99% of the people that do backtesting, just look at the data and then try to find a system that would work on that data. That is already over fitting...
Not necessarily true, data mining is an industry. biggest problems arise when this is mixed with hope and cognitive bias
That is only 10 percent of the problem. The biggest problem is that people think the data they are using represents the market. Like a daily chart or any time frame with an open, high, low, close and bid/ask volume represent the market. That is mistake nr 1 in the whole industrie of backtesting. Second mistake is using that data as a basis to do all sort of calculations. All that data leave out the biggest reason why price move. And if they do find a system that work with the data they have, it doesn't work because of the data or analysis but because of the position sizing they are using.
Datamining misses the most important factor: creativity, thinking out of the box. In fact datamining does not think at all. It can make huge amounts of calculations, but that's all it can do. The information that humans give define the rate of success of datamining. The computer will never have own/new ideas or think out of the box. If you ask a computer: why this result. He will "tell" you that all he can do is math. So it is a mathematical result. But markets are not mathematical. That's why price can be overbought or oversold. Behavioral finance is very important in trading. And that's something you don't use in datamining. Datamining is exact science; markets are not.
I understand how you think, but actually behavioral and social science relies on math. That’s why you see so many people with this background in data science. Math is not necessarily ‘linear’. Data mining can also spark ideas, at least it did with me. So it is not as black and white as you would think. So perhaps using data mining to create ideas is in a sense also, out of the box
When you optimize your data set the end date to 1 year ago, then check to see how your system performed in the past year. The past year becomes real, out of sample data. Also, when you optimize have your system run both long and short. And, optimize the data over a time period when the market has had some major bull and bear runs. Otherwise, you are just curve fitting to a particular type of market. If you optimize for long in a bull market, the system is not likely to work well, or at at all, in a bear market and vice versa. If you have the ability to backtest multiple symbols test the 3x ETF pairs, like TQQQ and SQQQ, both long and short.
To summarize, you are looking for a representative sample. Suppose you have a method using options and want to replicate a bear market like 2000/2001 (no option data). The closest thing (I think) is to take some (index) declines and 'lengthen' them combined with random shuffeling the days. Offcourse only an aproximination, but since there is no data there are no other possibilities I am aware of. Next again, a 'backtest' is in a sense always an aproximination. What do you think about this approach?