Not only has no one written that the domain between sub-segments be the same, but I have repeated referenced it being variable as a relevant factor., , , That said, great exhibits (labeling aside).
Simple models and simple changes to such can yield vastly different results. A tool that may help is how many reasons do you have for your solutions not to be overfit? Doesn't matter what they are, but how you establish them matter greatly. These reasons may even be superior to out of sample and forward testing, because if they're right, they should work regardless of these tests, though they could still act as a tool for model validation. Complex models on the other hand, may be overfit already, simply because of how they became so complex in the first place (in order to fit the data perhaps?). They're often characterized by lack of robustness and fickle dependencies (ie. bad data quality). It's a mindbender and topic of exploration that may take lifetimes.
I'm here for trading related entertainment. -Segment 1 (70% or data) turns out to be based on a strong bull market, -Segment 2( 30% of data) turns out to be based on a rapid decline. -Segment 3 (100% of data) *we have a long only strategy *we are blinded and have no idea what the data in segment 2 looks like A) If we tested strategies based only on segment 1, then the equity curves could significantly under-perform on segment 2, making the strategies no longer viable. If some still performed as expected ( even after a regime change), then we know what to investigate further. B) If we were unblinded and tested strategies across all data 1+2( Segment 3) our strategy design could have already compensated for the decline seen in segment 2( In fact we might have decided that a long only strategy was no longer a viable option). Either way, we have opened ourselves up to curve fitting, or at least increased the likelihood. When Segment 1 contains vastly different characteristics to Segment 2, then the strategies we arrived at in (B), are going to be different to the Strategies we arrived at in (A). Even though the strategies that performed well in (A) will still perform the same in (B), they could easily get overlooked for better performing strategies derived from only (B). Therefore we will not arrive with the same choice of strategies in both cases.
You are missing that the x-axis values are different in segment 1 vs segment 2. Hypothetically, the best strategy for segment 1 can also be the best strategy for segment 2.
As a matter of fact, if - a strategy is built on a good prior hypothesis - the effect has good statistical significance - and the number of free parameters is low (preferably none) it's a perfectly OK thing to do. In fact, you would be better served building a collection of simple strategies this way vs going in circles optimizing something complex.
Oh, of course. I agree. I was merely pointing out that that's not the conclusion that can be drawn from this particular hypothetical.
What are you talking about? Hypothetically sure, the best streagy for segment 1 can also be the best strategy for segment 2. However, it can also not be the best strategy aswell.
I wanted to expound, but had to stop my analysis of your post. (See below). I know right. Ok. This is not what the OP says. The OP says that we pick one of the available strategies that also does well in segment 2 as well as segment 1. So I guess I must stop here since your hypothetical requires something different.
Optimizing on seg1 and then picking only strats that look pretty on seg2 will result in a similar selection of strats as optimizing on the whole seg3. If we are testing a non-optimized strat - same thing. We end up with a similar selection regardless of whether we explicitly optimize some parameters or not. By selecting only pretty equity curves we are "optimizing". It's really not that hard to understand (or I guess it is for some people judging from some of the replies on the thread). The out of sample thing is a fallacy and great for marketing, especially to retail traders. It proves nothing and does nothing to increase the likelihood of success live. Other tests of robustness must be implemented.