New performance metric - Could I get your help?

bantam · Mar 17, 2014

Hi everyone,

I'm setting out to create a better single metric to use when backtesting strategies. I've always used the Sharpe Ratio, but it isn't perfect. It fails to consider the max drawdown, number of losing months, etc. Also, I'm less interested in strategies that haven't worked for the last couple of years.

Would you have time to rank some simulated backtests?
http://www-scf.usc.edu/~gfharris/rank.html

My hope is you'll help me rank a good number of charts. Then, I'll try some machine learning algorithms to develop a metric that better predicts the rankings:
http://en.wikipedia.org/wiki/Learning_to_rank

As I've now gone back to school, one of my goals is to publish the results. So, in the end, I will present the method and accuracy in a formal whitepaper. I will post the finished algorithm in MATLAB and perhaps a few other languages. Of course, since I'm asking the broader community to help with the ranking, I will post the charts, raw data, and all the rankings on the website upon the completion of this project.

I'm interested in hearing your feedback. In my reading in EliteTrader, I've found the following metrics discussed, and I'm curious to see how well each can predict the community rankings:
Sharpe Ratio
Profit Factor
maxdrawdown
percent of trades profitable
average winning trade return
average losing trade return
ratio avg win / avg loss
max consecutive winners
max consecutive losers
largest winning trade
largest losing trade
profit per month
max time to recover
average MAE
average MFE
average ETD
Recovery factor
CAR/Maxdd
profit factor
rr ratio
ulcer index
K Ratio

I sincerely appreciate your help, and I know your time is valuable.

bantam · Mar 17, 2014

Thank you very much to the first chart ranker. I ran some numbers, and it seems that you chose the chart with the higher Sharpe Ratio 81% of the time. I wasn't sure what to expect in that regard, and it will be interesting to see if that holds for others as well.

Sergio77 · Mar 20, 2014

MAR is what the industry cares about.

bantam · Mar 28, 2014

I'm getting more excited about this project. I've ranked around 350 charts, and I think I'm gaining more insight into myself and what kind of backtests I find more attractive. Take a look at the pair of charts below. I think I actually like the second one better, even though it has a devastating 2-year drawdown. The first chart doesn't seem to do well for the last year. The second chart is on fire at the end. It is also such a smooth line, that it makes me wonder if I could time it. I mean, the loss was gradual and smooth. It gives me the feeling that it would be profitable right now, and if it ever turns bad again at least it will do so gradually, giving me time to shut down. What do others think? If you had to make a bet on the near future performance of these two charts, where would you put your money? That's the kind of nuance I'm hoping to capture with a new data mining metric.

goodgoing · Apr 10, 2014

Suppose a method is a winner 100% of the time. Would anyone look at any metric other than this fact?

Deep Algorithms · Apr 11, 2014

I am not sure that looking for a best single metric is a well-posed problem, but a good return-risk measure that avoids a number of the drawbacks of the Sharpe ratio is the Return Retracement Ratio.

There is an old, but very relevant, article that I think will help with your work:

J. Schwager, "Alternative to Sharpe Ratio Better Measure of Performance," Futures, pp. 56-57 (1985).

If you can't find this article, it was also adapted into a book chapter titled "Better Measure of Performance", but I can't recall the name of the book. I know the .pdf can be found online though.

kut2k2 · Apr 11, 2014

bantam said:
Hi everyone,

I'm setting out to create a better single metric to use when backtesting strategies. I've always used the Sharpe Ratio, but it isn't perfect. It fails to consider the max drawdown, number of losing months, etc. Also, I'm less interested in strategies that haven't worked for the last couple of years.

... In my reading in EliteTrader, I've found the following metrics discussed, and I'm curious to see how well each can predict the community rankings:
Sharpe Ratio
Profit Factor
maxdrawdown
percent of trades profitable
average winning trade return
average losing trade return
ratio avg win / avg loss
max consecutive winners
max consecutive losers
largest winning trade
largest losing trade
profit per month
max time to recover
average MAE
average MFE
average ETD
Recovery factor
CAR/Maxdd
profit factor
rr ratio
ulcer index
K Ratio

I sincerely appreciate your help, and I know your time is valuable.
More...

System Achievement Score

kut2k2 · Apr 11, 2014

goodgoing said:
Suppose a method is a winner 100% of the time. Would anyone look at any metric other than this fact?
More...

No but so what? That's like asking if you inherited a huge fortune, would you have to work for a living? Again the answer is no but so what? For those of us who aren't rich and who don't have a holy-grail trading strategy, a job and a good performance metric help a lot.

bantam · Apr 25, 2014

Thanks, Deep. It looks like Schwager talks about RRR in his book, "Technical Analysis." I'll read it more closely and definitely include RRR in my testing.

kut2k2, thanks for calling my attention to SAS. I wasn't aware of the performance measure discussion in that thread. I'll code it up and add it to the list.

Hereâs an update on this project. Iâve started reading the related work in this field, and Iâm kind of amazed at how many performance metrics there are. One paper mentioned there were over one hundred. It looks like many of them are very similar. Those who donât like the Sharpe Ratio reject it because returns are often not quite normally distributed, so they consider higher moments or some other measure of risk. In this paper, the authors test 13 metrics and conclude that they all give nearly the same rankings, so we should just stick with the Sharpe Ratio:
Does the choice of performance measure influence the evaluation of hedge funds? (Eling and Schuhmacher, 2007)

The approach Iâm taking seems quite different from what Iâve read about so far. They try to improve the performance measure theoretically for traders to use. Iâm going the other direction, where Iâm seeing what traders think and trying to emulate it algorithmically. There seems to be room for improvement. Iâm finding that the Sharpe Ratio has an accuracy of less than 70% in the pairwise ranking task I set up. That means more than 30% of the time, one of you has shown a preference for the chart with the lower Sharpe, which I find noteworthy. I hope it isnât due to pranksters.

May I ask again for help labeling more data? Right now I have 387 community pairwise rankings. Separately, Iâve done a few hundred myself, but those are more for sanity checks and comparisons. I donât want to bias the results with my own data. Again, all the rankings and raw data will be made public for your own experimentation.

kut2k2 · Apr 25, 2014

bantam said:
kut2k2, thanks for calling my attention to SAS. I wasn't aware of the performance measure discussion in that thread. I'll code it up and add it to the list.

Hereâs an update on this project. Iâve started reading the related work in this field, and Iâm kind of amazed at how many performance metrics there are. One paper mentioned there were over one hundred. It looks like many of them are very similar. Those who donât like the Sharpe Ratio reject it because returns are often not quite normally distributed, so they consider higher moments or some other measure of risk. In this paper, the authors test 13 metrics and conclude that they all give nearly the same rankings, so we should just stick with the Sharpe Ratio:
Does the choice of performance measure influence the evaluation of hedge funds? (Eling and Schuhmacher, 2007)

The approach Iâm taking seems quite different from what Iâve read about so far. They try to improve the performance measure theoretically for traders to use. Iâm going the other direction, where Iâm seeing what traders think and trying to emulate it algorithmically. There seems to be room for improvement. Iâm finding that the Sharpe Ratio has an accuracy of less than 70% in the pairwise ranking task I set up. That means more than 30% of the time, one of you has shown a preference for the chart with the lower Sharpe, which I find noteworthy. I hope it isnât due to pranksters.
More...

It's probably due to the fact that the Sharpe ratio is a poor measure of performance. This is due to the fact that standard deviation is a poor measure of risk. Standard deviation was invented by statisticians to measure uncertainty, not risk. When a lazy economist decided to use standard deviation as his risk metric, it has created decades of econometric nonsense ever since. Risk is the potential for loss. Standard deviation increases with both gains and losses, so it should be no surprise that there is significant divergence of opinion over which charts have 'better' Sharpe ratios.