Confidence intervals in strategy performance

nonlinear5 · Jan 15, 2007

I'd like to know if there are any standard measures of confidence boundaries in a given strategy performance. For example, suppose I have a strategy that shows a profit factor of 2.0 over the 100 trades that it made during the last 3 months. How do I know that those results are not due to chance?

Some things are intuitive: the larger the number of trades, and the longer the test period, the higher the confidence level. But I am looking for some standard statistical measure to do this kind of evaluation.

vikana · Jan 15, 2007

Here's an excellent thread started by EricP:

http://www.elitetrader.com/vb/showthread.php?s=&threadid=36083&highlight=confidence+intervals

One of my all-time favorites. His views definitely changed how I activate and deactivate systems and stocks.

granville · Jan 15, 2007

I use a very simple test:

If I discovered it X years ago, would I have made money since?

Technically, if the most profitable/optimized parameters do not creep or change over time, then I am confident enough to trade the system going forward. I determine this by optimizing over the first half of data and confirm that those parameters are also the most profitable parameters for the second half of data.

nonlinear5 · Jan 15, 2007

Quote from vikana:

Here's an excellent thread started by EricP:

http://www.elitetrader.com/vb/showthread.php?s=&threadid=36083&highlight=confidence+intervals

One of my all-time favorites. His views definitely changed how I activate and deactivate systems and stocks.
More...

Thanks, that's a good read. The discussion in that thread is around the formula:
X = abs(Avg-Profit) * (Number of Trades)^0.5 / (Std Dev of Profits)

X then maps to a confidence percentage.

One thing that popped at me immediately is that the length of the test period is nowhere in this calculations. So, if I have a system that made 200 trades in a single day, then my X may indeed mean 95% confidence level (or maybe even higher). But it feels like something is missing, if it's derived from just one day of back testing.

vikana · Jan 15, 2007

time is not directly involved, it's just the individual trades. So 200 trades over however long it takes.

NoWorries · Jan 15, 2007

It's possible to compute a confidence interval for any statistic (max DD, profit factor etc.), but I prefer not to rely on normality assumptions (as the formula mentioned above seems to do). I've posted an excel spreadsheet in another thread that uses non-parametric bootstrapping (and hence does not rely on normality assumptions) to compute a confidence interval for the expected return, but you can modify this to any other statistic easily.

http://www.elitetrader.com/vb/showthread.php?s=&postid=1246313#post1246313

In most of the discussions here on ET, the concept of "statistical power" (Google it) is omitted, which is unfortunate. Meaning that if you have only a small sample size, you might find a profit factor that is not significant from zero, only because you have too few observations. On the other hand if your sample size is extremely large, pretty much everything will have statistical significance.

ptunic · Jan 15, 2007

I think some of these tools can be very helpful. That said let me add some other thoughts as well (I know this is a bit contradictory).

In some cases I think it can be misleading to focus more precisely on some of these numbers. Particularly since key variables cannot accurately be predicted: will someone else, say tomorrow or next year, find the same edge reducing potential gains; or will the markets change for other reasons also reducing the perceived edge, etc. So by using very precise numbers, it can be misleading IMO. For example if I look across the wall in my room, it is misleading to say I think it is 8.34512 feet away. That implies a certain amount of precision. It would be more accurate for me to say I think the wall is 8 feet away, or even 5-15 feet away from me. You actually gain, not lose, information by being less accurate if that makes sense.

I think what is far more important than what strategy measurement formula is used (whether it is level of confidence or even as simple as Profit Factor or Sharpe ratio) is what was the process used to identify the edge. Specifically, were extensive out of sample tests used, and even more importantly, is there a valid reason for the edge. I personally have much more confidence in edges with logical, easily explainable rationales than strategies that show great performance but have a questionable basis.

Avid_Consumer · Jan 15, 2007

on what basis do you estimate a capacity of 100mm?