You're right that B is a problem, but not necessarily always a problem. Reading through this thread it's apparent that there are 2 groups talking past each other a bit. Group 1 is rightly cautious of data mining, which is what you're describing. As a result, they feel that using testing to find an edge is always bad. It generally is bad using the standard model most of us use that you describe well, the tweak/test/fail, tweak/test/fail.... model. There is, however, another group out there, many professionals, who use an entirely different model somewhat like a monte carlo simulation which is to simultaneously test millions of random permutations of a strategy on a data set. They then use some techniques like various out of sample tests and testing during different market conditions to winnow that down to a few hundred strategies, then examine those to determine if there's a thesis to support them or if they're just data mining results or just throw money at all of them if they've done some statistical analysis to show that it's OK to have some spurious data mined results in there. Nothing wrong with this technique, a few of the successful big hedge funds use it, it's just probably not something most of us would do. So when group 2 on this thread is talking about backtesting for a strategy as viable, they're probably talking something like this. And it's perfectly OK for group 2's strategy to be viable while simultaneously saying the tweak/test/fail model isn't generally viable.
If we are talking about Advances in Financial Machine Learning book, I have read it and found it mildly entertaining. Some of the specific topics were over my head, some generic stuff was full of truisms, some was "ho-hum" and most was "WTF is he talking about". It's impossible to deny that a lot of what he writes it tinted by his own approach. Anyway, as I said before, the guy is a fantastic self promoter but pretty much failed in extracting real alpha from the markets. To quote a buddy of mine, "he was running a PnL-neutral strategy". What exactly is "what people do" you are referring to? If you are using a real-life market phenomena as a base for your hypothesis (my preferred approach), you do not need ML at all and instead can use simpler things like impact models or various heuristics. FWIW, these days I do use a fair number of ML techniques (tree models for my higher frequency alphas, for example) and have formulated my own opinion about their strengths and weaknesses.
Reminds me of a conversation between a physicist and an engineer. Take option as an example. A physicist would ask: What are the basic principles behind the pricing of options and what can cause the dislocations of that price? That dislocation is then tradable. An engineer would ask: I notice volatility tends to revert to the mean. It is tradable but may need constant tweaking. @ma_trader I assumed you derived your winning method from basic principles? If you have a real edge, backtesting should validate your few months' profitable trading record.
Thanks for sharing those. Can't really speak to the UHF, as I don't have much experience there. Something like a pattern though, I would think would be amenable to testing. I get the part about altering the outcome by impact, but I wonder how you arrive at the analysis that the pattern yields some kind of outcome to begin with. Isn't that some kind of testing/analysis based on past behavior (even if implicitly by experience/intuition)?
Since you are on the topic of ML. Few in ML would approach testing in the "tweak/test/fail, tweak/test/fail, tweak/test/fail, tweak/test/success! " approach to begin with. There is something called cross-validation and generalization in the field that is very well known, and is the antithesis of what you describe. Secondly, we use ensemble/averaging methods to overcome some of that as well. There are still a lot of limitations with known/published ML though. I reviewed DePrado's book, and I would say a lot of it is overly theoretical vs. hands on. As an example, he talks about failures of MPT and how HRP is a sound alternative, yet shows no real data to demonstrate. When you actually look at real data, there is very little discrepancy in out of sample performance. That's a lot of work to validate. It would be great if they had a practical/empirical companion. Can't say I understand the adulation all that much.
Let’s distinguish a “backtest” from a “model calibrated to historical data”. A regression that take some outcome (e.g a level of a book being taken out) on some input (e.g. ratio of size at top of the book to exponential average of all levels) and produces some values used in forecasting is a model but not a backtest. A rolling simulation of the usage of said regression to trade in a specific way is a backtest. You see the difference?
I don't really distinguish those cases the way that you do. Especially with respect to the thread. When people on the forums say that backtesting is a waste of time. I think of backtesting as an integrated process, much like R&D. from that perspective, I don't see that process as a waste of time. I don't necessarily separate this process into first building a specific model and calibrating to historical data, then consider trading around this model in a rolling simulation, as a separate "backtest" component. Both of those steps (to me) are components of backtesting. And they don't even need to be separated. Suppose I had some hypothesis model(s) that is (are) completely unknown to me. I only have the input data, some method of quantitatively generating a set of hypotheses (maybe trillions), and my objective criteria. My models themselves could (and often do) include trading decisions and responses (even bet sizing). Maybe I consider optimizing my objective over rolling or segmented windows, as superior to using all historical data (what you might call calibrating the model), so I use that criteria as an input to my model generator and optimization criterion (I could even choose how to validate as part of the optimization criterion). Fitting around an anchored historical data set might be a sub-optimal way of fitting my model. I'm interested in the fit, fits, or even ensemble of fits, that trade off between best optimizing my criterion and giving me the most confidence that the hypotheses that I choose over the data that I have are a good representative of the behavior that I expect to see on unseen data. There are many variations of this process, but again, I consider backtesting to comprise all of these. When I have new data, if it behaves very differently than I expect, I need to go back and understand why my assumptions were so wrong. But part of backtesting involves trying to properly guage that beforehand. I am glad to hear some of your descriptions (and especially examples), because it helps me to understand how other people might be perceiving these concepts differently.
I Its not a no-testing-at-all sort of a thing. But the concept to exploit does not come from testing, it comes from thinking and imagination, and does not require testing to get qualified as an edge. The edges I have discovered are confidential, cannot talk about them here.
I'd think there is a big difference between a historical study and a backtest. Let's take a different example - let's say someone out there is picking out volatility trades. She can look at the history of implied volatility and history of realized volatility to find trades with positive expectation. However, neither of the two are tradeable assets and this study is not a backtest.