Influence of number of trades for system optimization

BookTrader · Nov 23, 2011

Hi there,

In optimizing parameters for any trading system, to my understanding, one always needs to optimize some (atomic) value. This could be e.g. NetPnL, Sharpe Ratio, Calmar Ratio, ... (you name it).

My question is the following:

In any optimization there is a danger of an excessive overfit. However, this danger should decrease with the number of trades (as the same number of parameters with more trades means they have less chance to pick outliers - in the extreme case, with 2 trades, picking high and low should be easy, but with a 1000 trades, picking spurious highs and lows is rather more difficult). Hence, given two identical results with different number of trades, I would expect the one with more trades two diverge less in a forward test than the one with less trades.

Now for the statisticians: Is there a way to put this into a formula? And if yes, how should I modify my objective function?

Say e.g. my original object function is PnL. Then obviously with two identical PnLs I will prefer the one with more trades. But should the adjusted objective function be PnL * Turnover, or PnL * sqrt(Turnover), or something else entirely?

Does any of this make sense?

Thanks

braincell · Nov 23, 2011

I see where you're going. I posted a similar question not too long ago, here:

http://www.elitetrader.com/vb/showthread.php?s=&threadid=231033

Short answer: it's probably best to use Sharpe Ratio. The main reason is StdDev. It will also give you much better results if the sample is larger (ie more trades), so stddev does it all as if by magic. Systems with higher SR and lower avg profits overall can be scaled up most of the time, so SR is imho the best way to pick them. You can also see some other suggestions in the aforementioned thread.

BookTrader · Nov 23, 2011

Hi Braincell,

I think I disagree. What you are referring to is imo the objective function itself. What that should be, and whether Sharpe is superior to PnL, is a different question. And it would go on and on. For example, maybe Sharpe Ratio * Calmar Ratio might be superior to Sharpe Ratio?

My question is going further than that. Or to rephrase it for your suggestion: given two results with identical sharpe ratios, the one with more trades is less likely to be overfit, and more likely to hold up in an out of sample test. How does one reflect this mathematically? I think this isn't addressed by your answer.

intradaybill · Nov 24, 2011

Quote from BookTrader:

Does any of this make sense?
More...

No, sorry to say that. You are making many assumptions. Higher number of trades does not directly equate to higher significance. Buy and hold has one trade with significance 100%.

As soon as you vary some parameter to maximize some objective function your system is subjet to selection bias.

Like pointing to the guy who won the lottery and claiming lottery is a positive expectation game. For whom? For the winners? Of course, but not for the many losers.

Do you see the problem? You have to forget how the word "optimization" sounds. Otherwise you got no chance, sorry.

Happy Thanksgiving to all Americans.

BookTrader · Nov 24, 2011

I quite disagree. Think of the bias-variance trade off of model estimation. With a model that has sufficient parameters to lend itself more to over than to under estimation, one would expect to have decent bias on the parameters but questionable variance. This variance should be reduced with an increased amount of observations.

I fail to follow your argumentation. A buy and hold strategy test with 1 trade certainly in my opinion has everything but 100% significance. On the contrary, it would be possible to fit the model perfectly, giving a bias of zero and some insane variance. Regarding selection bias, of course there is bias, it is a question of reducing the bias. And I fail to understand even the point of the rest of your post. What has this got to with positive expectation for a lottery winner? (huh?) The "sound" of optimization?

intradaybill · Nov 24, 2011

Quote from BookTrader:

I quite disagree. Think of the bias-variance trade off of model estimation. With a model that has sufficient parameters to lend itself more to over than to under estimation, one would expect to have decent bias on the parameters but questionable variance. This variance should be reduced with an increased amount of observations.

I fail to follow your argumentation. A buy and hold strategy test with 1 trade certainly in my opinion has everything but 100% significance. On the contrary, it would be possible to fit the model perfectly, giving a bias of zero and some insane variance. Regarding selection bias, of course there is bias, it is a question of reducing the bias. And I fail to understand even the point of the rest of your post. What has this got to with positive expectation for a lottery winner? (huh?) The "sound" of optimization?
More...

If you know all these things then why are you asking?

When you are optimizing forget about significance and number of trades. It is selection bias you should worry about.

Increasing the number of observations of models with selection bias does nothing to reduce the selection bias.

or this is not clear?

BookTrader · Nov 24, 2011

What do you mean by number of observations of models? What is increased is the number of trades, e.g. the number of samples. It makes intuitive sense that more trades against the same model parameters means higher significance for model parameter estimation.

So, no, what you are saying is not clear. I am even struggling to understand why you don't see this. Given two strategies of 1000 trades of a profit of 10 each, and 1 trade with a profit of 1000, and two model parameters that have been optimized in each case, would you really not prefer the first model?

Or, do you not believe backtesting over a longer period of time (which comes down to the same thing, namely increasing the number of observations), reduces variance of your estimates?

The reason I am asking is not to discuss whether or not this is the case (I am quite certain it is), but to discuss ways to model it, as that is not straight forward. It depends e.g. on the degrees of freedom in your model to begin with.