Is Walk-Forward (out of sample) testing simply an illusion?

pursuit · Oct 17, 2017

If we test a bunch of strategies on data segment 1 and then data segment 2 and then keep the ones that do well on both..........

Isn't that the same as testing them on segment 3 which is a combination of 1 and 2 and keeping "the good ones"?

We'll arrive to the same choice of strats in both cases, no?

truetype · Oct 17, 2017

No sure about 'illusion,' but yes, it's overrated.

tommcginnis · Oct 17, 2017

pursuit said:
If we test a bunch of strategies on data segment 1 and then data segment 2 and then keep the ones that do well on both..........
Isn't that the same as testing them on segment 3 which is a combination of 1 and 2 and keeping "the good ones"?

We'll arrive to the same choice of strats in both cases, no?
More...

Nope.
You have to remember the border:
If you imagine the market (or whatever) as a perfect wave, and your border of segment1→segment2 was the very top of the wave, your segment1 strats would be biased "long". {etc etc etc.}

I am totally forgetting the test condition for a fair conclusion of 'same data' comparisons, but it's readily obtainable, and should be a snap to input. {Sorry. Market's open....Gotta go.}

But the idea is, that if it's possible for the border to play a role in success/failure, then *where* the border is {I'm sliding it back and forth in my mind} matters. And so, Seg1 + Seg2 =/= Seg 3.

pursuit · Oct 17, 2017

tommcginnis said:
Nope.
You have to remember the border:
If you imagine the market (or whatever) as a perfect wave, and your border of segment1→segment2 was the very top of the wave, your segment1 strats would be biased "long". {etc etc etc.}

I am totally forgetting the test condition for a fair conclusion of 'same data' comparisons, but it's readily obtainable, and should be a snap to input. {Sorry. Market's open....Gotta go.}

But the idea is, that if it's possible for the border to play a role in success/failure, then *where* the border is {I'm sliding it back and forth in my mind} matters. And so, Seg1 + Seg2 =/= Seg 3.
More...

Doesn't make sense. Doesn't matter where the border is. We're looking for "nice equity curve" before and after the border regardless of the border's location.

maler · Oct 17, 2017

If you mean data 1 as in sample and data 2 as out of sample, you raise a good question. It goes to the heart of what knowledge is.
Backtesting is rooted in the scientific method, which uses statistics to test a hypothesis.
The investigation can only tell you that the hypothesis is true (in this case that the strategy works) with
a certain degree of confidence so there is always the risk of a false positive.
Formulating the hypothesis before looking at the (out of sample) data is but one of the many things to be aware of (this is strictly so the numbers make any kind of sense; it is very easy to not be aware that you looked at out of sample data if your analysis was inspired by third parties insights that were in turn aware of said data).
The size of the out of sample data is also important. Say you get an out of sample p value of 0.1 that the information
coefficient of your strategy is above 1 or 2 or whatever your psych profile is comfortable running, there is still
a 1 in 10 chance you are being fooled. So if you try 10 different things even though you did not look at the out of sample data
when cogitating it is very likely you are going to be fooled at least once.
However, none of this deals with what I consider to be most important, namely the law of ever changing hypotheses.
By the time you find the key, the lock has probably changed. But this dynamic is a discussion for another thread.

maler · Oct 17, 2017

As a pure math question, testing on 1 and then on 2 is more comprehensive than a test on 3.

Simples · Oct 17, 2017

pursuit said:
If we test a bunch of strategies on data segment 1 and then data segment 2 and then keep the ones that do well on both..........

Isn't that the same as testing them on segment 3 which is a combination of 1 and 2 and keeping "the good ones"?

We'll arrive to the same choice of strats in both cases, no?
More...

3 is all in-sample.
2 is out-of-sample only the first time around.
Both are no guarantee your search won't yield false positives. The more you search, the more false positives.

ironchef · Oct 17, 2017

pursuit said:
If we test a bunch of strategies on data segment 1 and then data segment 2 and then keep the ones that do well on both..........

Isn't that the same as testing them on segment 3 which is a combination of 1 and 2 and keeping "the good ones"?

We'll arrive to the same choice of strats in both cases, no?
More...

I think tom, the analyst is right that the best fits of 1 and 2 may not be a best fit of 1 + 2. It is not difficult to construct a case where both 1 and 2 are trending but if they are separated by a choppy period the curve fit formula may not be the best fit of 1 + 2. this is especially true if the fit is over-specified.

I am not smart enough to give a mathematical proof. Maybe someone else can.

pursuit · Oct 18, 2017

maler said:
So if you try 10 different things even though you did not look at the out of sample data
when cogitating it is very likely you are going to be fooled at least once.
More...

this

Macca1 · Oct 18, 2017

pursuit said:
If we test a bunch of strategies on data segment 1 and then data segment 2 and then keep the ones that do well on both..........

Isn't that the same as testing them on segment 3 which is a combination of 1 and 2 and keeping "the good ones"?

We'll arrive to the same choice of strats in both cases, no?
More...

No way.