How do you avoid overfitting or over-optimization in your backtest?

intradaybill · Feb 26, 2010

Quote from dtrader98:

For instance, not only is C>O a curve fit: C-O>0, but there is an implicit parameter, which is the value zero. However you arrived at that strategy C>O, there could have been an infinite number of alternative strategies, i.e. C-O>x, where
x subset Real numbers.
More...

C>O or C-O>0 is the curve itself, not a curve fit. As soon as you introduce a parameter, like in C-O>x, you have a curve fit, which becomes the curve itself for x=0 (think about it). It is crucial to understand the difference.

There is no point speaking about curve fitting when one uses the curve itself and not another curve that is fitted to the original curve.

There is an issue of why selecting C-O>0 and not O-C>0, which may introduce selection bias, but that is a very different animal than curve fitting.

Furthermore, your argument about implicit parameters is funny. Using the same reasoning, we can assume that in every physical situation there are implicit variables, like for example, in the simple Ohm's law

V = iR, that there is an implicit variable because it can be stated as

V-iR > 0 and it is actually V-iR > x, with x=0.

The point I am trying to make is that in the above, for x=0, there is a unique selection that matches reality. There is no such thing as an implicit variable with optmimal value equal to zero. The variable simply does not exist. It is Occam's Razor that requires that this variable should not exist. Thus, your introduction of implicit variables violates this principle.

ScoobyStoo · Feb 26, 2010

Anytime you are not flipping a coin to make a decision you are by definition going to have to make a choice about the data on which you base your decision. This in itself, whether you realise it or not, is either an explicit or implicit optimisation.

In the earlier example close > open the system is optimised by virtue of the fact that the data period selected for the decision making process is a trading session.

Building a trading system cannot be approached in the same way as deriving a fundamental law of physics. At some point you will have to make a choice as to a parameter and then, voila, you are either guessing or you have to optimise.

Code7 · Feb 26, 2010

Let's say you test a large number of different signals, whatever they look like. Now, if there is a way to meet your fitness criteria. Some signals may meet your criteria just by chance. The more you test, the more likely you'll find what you want. Like testing random combinations to guess an unknown password.

There is no way to avoid this when backtesting. It can only be avoided in real trading. The combined actual result of all your brokerage accounts is never subject to data mining bias. Your real money results might be random or insignificant but they are not optimized.

dtrader98 · Feb 26, 2010

Quote from intradaybill:

C>O or C-O>0 is the curve itself, not a curve fit. As soon as you introduce a parameter, like in C-O>x, you have a curve fit, which becomes the curve itself for x=0 (think about it). It is crucial to understand the difference.

There is no point speaking about curve fitting when one uses the curve itself and not another curve that is fitted to the original curve.

There is an issue of why selecting C-O>0 and not O-C>0, which may introduce selection bias, but that is a very different animal than curve fitting.

Furthermore, your argument about implicit parameters is funny. Using the same reasoning, we can assume that in every physical situation there are implicit variables, like for example, in the simple Ohm's law

V = iR, that there is an implicit variable because it can be stated as

V-iR > 0 and it is actually V-iR > x, with x=0.

The point I am trying to make is that in the above, for x=0, there is a unique selection that matches reality. There is no such thing as an implicit variable with optmimal value equal to zero. The variable simply does not exist. It is Occam's Razor that requires that this variable should not exist. Thus, your introduction of implicit variables violates this principle.
More...

Disagree. Funny you should use ohm's law, because it 'is' implicit that the noise is zero. In a 'real' physical system there is always noise present, it is just small enough that there is minimal bias. You don't see this is in basic textbook theory, but I assure you it is there. And comparing ohm's law or any physics law to markets is not really a good comparison . Give me any market based practical rule set O>CL .. the full rule.. if O>CL then what? and what is the objective function? I will show you there is indeed implicit bias, as market data is noisy, much more so than any normal scale physics based law (markets are closer to quantum physics, which at the atomic scale has all kinds of bias/noise/weird behavior present).

If markets were anything like classical physics, there would be no need to add any idiosyncratic term to the end of equations. It is there to explain away the relatively large unavoidable uncertainty present in any market based model. Anyways, I see your point, just don't agree with it.

Mike805 · Feb 26, 2010

Quote from dtrader98:

Disagree. Funny you should use ohm's law, because it 'is' implicit that the noise is zero. In a 'real' physical system there is always noise present, it is just small enough that there is minimal bias. You don't see this is basic textbook theory, but I assure you it is there. And comparing ohm's law or any physics law to markets is not really a good comparison . Give me any market based practical rule set O>CL .. the full rule.. if O>CL then what? and what is the objective function? I will show you there is indeed implicit bias, as market data is noisy, much more so than any normal scale physics based law (markets may be closer to quantum physics, which at the atomic scale has all kinds of bias/noise/weird behavior present).

If markets were anything like physics, there would be no need to add any idiosyncratic term to the end of equations.
It is there to explain away the relatively large unavoidable uncertainty present in any market based model. Anyways, I see your point, just don't agree with it.
More...

Since we're talking about ohm's law here; did you know that ohm's law actually fails at nanoscale?

Would devising a permutation of ohm's law that works at a nanoscale level be considered a curve fit?

dtrader98 · Feb 26, 2010

Quote from Mike805:

Since we're talking about ohm's law here; did you know that ohm's law actually fails at nanoscale?

Would devising a permutation of ohm's law that works at a nanoscale level be considered a curve fit?
More...

There's all kinds of weird behavior at the quantum level. It works at macro levels, because aggregate ensemble behavior... well, behaves semi-predictably well. In real life, we have to evaluate products across 'corners' and design towards typical operation levels, because every product works slightly differently at different temps/voltages/production/etc...

It would be a curve fit, because if small and noisy enough, you can only estimate the aggregate behavior of the molecules... like any other noise based phenomena, there is bias variance tradeoff in any sample. And speaking of brownian motion... ring a bell towards financial modellers (bachelier anyone)?

intradaybill · Feb 26, 2010

Quote from Mike805:

Since we're talking about ohm's law here; did you know that ohm's law actually fails at nanoscale?

Would devising a permutation of ohm's law that works at a nanoscale level be considered a curve fit?
More...

Well, I brought this up as an example and not to be taken literally. Ohm's law is a gross approximation even in macro scale because every not ohmic conductor has inductance and capacitance, so iR is the linear term only. A better law is

V =iZ, where Z is the complex impedance and V and i are the complex scalars.

I think most of you loosely call selection curve-fitting. These are not the same. I will give you an example from real life. One guy walks into a bar and tries to find a girl that matches the looks of a model he knows. He looks at the girls and he estimates which one has the closest fit. This is curve-fitting. If the sample is large enough, he may come very close to his optimal fit. But he will never get the actual model he likes, just a fit.

Another guy walks into a bar and has no particular model in his mind. He watches how the girls behaves and he selects the one he thinks performs better. This is not a fit, this is selection

Pls understand the fundamental difference between the two cases. The former guy will never succeed in finding his actual model, only an optimal fit. The other guy will always get someone given a large diverse sample.

Now someone will ask, what about the case that they both choose the same girl? Isn't that a proof that curve-fitting and selection lead to the same results?

Here is the important detail: if the first guy had to chop away the hands, feet and heads of a few girls to assemble the optimal match, then the resulting girl is not real, save alive. The second guy always picks a real person. This is the difference between devising artificial functions to trade and on the other hand looking at actual price data, like comparing one girl to another and selecting someone as opposed to averaging, for example.

I think I cannot make it simpler than that.

TraderSystem · Mar 12, 2010

TradersStudio has "Optimization feature" and supports at both Session (system) level and the Tradeplan level. Also, TradersStudio is the only product having a built-in "Walk forward" feature.
Gyles has explained the same here:

http://www.elitetrader.com/vb/showthread.php?s=&postid=1710942&highlight=optimization#post1710942
http://www.elitetrader.com/vb/showthread.php?s=&postid=1791978&highlight=optimization#post1791978
http://www.elitetrader.com/vb/showthread.php?s=&postid=1792009&highlight=optimization#post1792009
http://www.elitetrader.com/vb/showt...hlight=walk+forward+TradersStudio#post2556404

Gyles also did an interesting experiment here:
http://www.elitetrader.com/vb/showthread.php?s=&postid=2078942#post2078942
http://www.elitetrader.com/vb/showthread.php?s=&postid=2090564#post2090564
http://www.elitetrader.com/vb/showthread.php?s=&postid=2137580#post2137580
http://www.elitetrader.com/vb/showthread.php?s=&postid=2130065#post2130065
http://www.elitetrader.com/vb/showt...107505&highlight=ThreeMACrossover#post2107505

mizhael · Mar 24, 2010

Quote from TraderSystem:

TradersStudio has "Optimization feature" and supports at both Session (system) level and the Tradeplan level. Also, TradersStudio is the only product having a built-in "Walk forward" feature.
Gyles has explained the same here:

http://www.elitetrader.com/vb/showthread.php?s=&postid=1710942&highlight=optimization#post1710942
http://www.elitetrader.com/vb/showthread.php?s=&postid=1791978&highlight=optimization#post1791978
http://www.elitetrader.com/vb/showthread.php?s=&postid=1792009&highlight=optimization#post1792009
http://www.elitetrader.com/vb/showt...hlight=walk+forward+TradersStudio#post2556404

Gyles also did an interesting experiment here:
http://www.elitetrader.com/vb/showthread.php?s=&postid=2078942#post2078942
http://www.elitetrader.com/vb/showthread.php?s=&postid=2090564#post2090564
http://www.elitetrader.com/vb/showthread.php?s=&postid=2137580#post2137580
http://www.elitetrader.com/vb/showthread.php?s=&postid=2130065#post2130065
http://www.elitetrader.com/vb/showt...107505&highlight=ThreeMACrossover#post2107505
More...

When I tried AmiBroker, it seems also have a Walk Forward Analysis, but it cannot concatenate the out-sample test results all together and compute overall Sharpe ratio on top of that, am I right or not?

TraderZones · Mar 25, 2010

Quote from mizhael:

I heard about "Walk Forward Analysis",

to me it's just like a dynamic rolling-window out-of-sample test coupled with optimization.

The result of this "Walk Forward Analysis" is a set of "optimized" parameters that are suitable for all history(it's robust, but kind of conservative).

Am I right?

Any good ways of avoiding over-fitting or over-optimization?

Thanks!
More...

If you have an idea, you test it. You see if it is robust across different settings. Then you walk-forward test it over a lengthy time period.

But if you are basically taking the best of a number of runs, you are cherry picking.