New performance metric - Could I get your help?

minmike · May 4, 2014

Seems problematic. I stopped at 50. Any more than that is asking too much.

Also sometimes both look invest able, sometimes neither. I would look at changing the methodology.

bantam · May 5, 2014

minmike, thank you for doing 50. That puts you in the top 4 contributors. As to your other point, it comes down to a question of the easiest way to extract knowledge from traders. Pairwise ranking can sometimes be tough, if the two charts look equivalent. The alternative, though, is to have you give some sort of numeric score for individual charts, off the top of your head. I believe that would be much more difficult. It would be hard to make consistent judgements.

kut2k2, I think I've got your System Achievement Score working. I'm going off this post: http://www.elitetrader.com/vb/showpost.php?p=3963107&postcount=170
It simplifies a little for my project. E is always equal to 1, since I've normalized the return for all simulations. I left off the 4, since it won't alter the rankings. And I set mant = N, since I don't really have a number of trades for these simulations. I suppose you could say that I'm assuming a single trade every day. My MATLAB code basically simplifies to:
Code:
kelly = @(k) sum(ret./(1+k*ret));
k = fzero(kelly, 20);
PF = sum(ret(ret > 0)) / sum(-ret(ret < 0));
SAS = k * PF;
Typical values of k are around 22, PF is around 1.4, which puts SAS typically around 30. Have I made any mistakes?

Here's the link for ranking more charts:
http://www-scf.usc.edu/~gfharris/rank.html

kut2k2 · May 7, 2014

bantam said:
kut2k2, I think I've got your System Achievement Score working. I'm going off this post: http://www.elitetrader.com/vb/showpost.php?p=3963107&postcount=170
It simplifies a little for my project. E is always equal to 1, since I've normalized the return for all simulations. I left off the 4, since it won't alter the rankings. And I set mant = N, since I don't really have a number of trades for these simulations. I suppose you could say that I'm assuming a single trade every day. My MATLAB code basically simplifies to:
Code:
kelly = @(k) sum(ret./(1+k*ret));
k = fzero(kelly, 20);
PF = sum(ret(ret > 0)) / sum(-ret(ret < 0));
SAS = k * PF;
Typical values of k are around 22, PF is around 1.4, which puts SAS typically around 30. Have I made any mistakes?
More...
Yes.

You cannot normalize expectation. That makes it meaningless. What's the point of that?

bantam · May 7, 2014

I suppose I've seen a few strategies that were somewhat less interesting to me because the return was only a little above the risk-free rate. But generally, I don't put much weight on the intrinsic leverage of a strategy. I've never used up what leverage I've had access to. If that's the core of SAS, then I suppose this experiment isn't appropriate to test it.

kut2k2 · May 7, 2014

bantam said:
I suppose I've seen a few strategies that were somewhat less interesting to me because the return was only a little above the risk-free rate. But generally, I don't put much weight on the intrinsic leverage of a strategy. I've never used up what leverage I've had access to. If that's the core of SAS, then I suppose this experiment isn't appropriate to test it.
More...

Leverage? I've never seen a legitimate k value that approaches anything that can be classified as "leverage". If you're using the ludicrous CK formula to calculate your k values, no wonder your results are so off. Go to the Trade Management forum, I've written about this extensively there.

murray t turtle · May 7, 2014

bantam said:
Hi everyone,

I'm setting out to create a better single metric to use when backtesting strategies. I've always used the Sharpe Ratio, but it isn't perfect. It fails to consider the max drawdown, number of losing months, etc. Also, I'm less interested in strategies that haven't worked for the last couple of years.

Would you have time to rank some simulated backtests?
http://www-scf.usc.edu/~gfharris/rank.html

My hope is you'll help me rank a good number of charts. Then, I'll try some machine learning algorithms to develop a metric that better predicts the rankings:
http://en.wikipedia.org/wiki/Learning_to_rank

As I've now gone back to school, one of my goals is to publish the results. So, in the end, I will present the method and accuracy in a formal whitepaper. I will post the finished algorithm in MATLAB and perhaps a few other languages. Of course, since I'm asking the broader community to help with the ranking, I will post the charts, raw data, and all the rankings on the website upon the completion of this project.

I'm interested in hearing your feedback. In my reading in EliteTrader, I've found the following metrics discussed, and I'm curious to see how well each can predict the community rankings:
Sharpe Ratio
Profit Factor
maxdrawdown
percent of trades profitable
average winning trade return
average losing trade return
ratio avg win / avg loss
max consecutive winners
max consecutive losers
largest winning trade
largest losing trade
profit per month
max time to recover
average MAE
average MFE
average ETD
Recovery factor
CAR/Maxdd
profit factor
rr ratio
ulcer index
K Ratio

I sincerely appreciate your help, and I know your time is valuable.
More...

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Great questions; Jack Schwager newest book;
gain to pain ratio.

[M]=murrays input]Since your 2 charts[choices] made same of money, start to finish, valley to peak;
not a prediction.LOL, But the one with the smallest drawdown,[ first year smaller drawdown + larger finish price ] looks MUCH better, since both charts are same total gain start to finish [5 year charts your data says]. Not that 5 years data means much ; it does not. All data is helpful

[A]Amazing number of turtles + big money does not pay much attention to ''percent winners''.Not saying ignore that, simply many DO well + ignore it,LOL

[R] RR ratio can be real important if youre not young/youthful;
77 or 88year old may not want/have time to wait for big trends to resume,LOL

Say, with all due respect, mr Bantam;
strategies not working for ''several years'' may be much less important than anything.For example SPY has been in bull market/bull trend, for several years, so i see your point,Mr Banty. But i would NOT ignore bear market, bear trend data[. Thanks for question

Wisdom is profitable to direct.

MoreLeverage · May 17, 2014

I did 100. I think my preference was first for consistency (on a short term scale, small fluctuations), and then secondary factors were things like smaller/fewer drawdowns, consistency on longer scales (fewer periods of chop vs modest profitability), and, when there were larger periods of volatility, that they made money first and gave it back rather than v versa.

Aileron · May 24, 2014

Sharpe is fine, when understood in context. It's not poor, it just has to be taken for what it is.

A lot of guys I know prefer Sortino. Doesn't punish upside volatility.

bantam · May 29, 2014

MoreLeverage, thank you for your help ranking. I'll be wrapping up the data collection soon. I've been ranking a ton of charts myself to see how well an algorithm can learn my preferences. My next step is to re-rank all the same charts I'm doing now. Then I can see an upper bound on the performance to expect from the algorithm. I mean, if I'm only consistent 85% of the time, then I can't expect a machine learning algorithm to do better than that based on the input I've given it.

I've coded up all the relevant performance measures in this book: Practical Risk-Adjusted Performance Measurement
At this point, it looks like the (related) Pain Index / Ulcer Index / Martin Ratio are the most accurate predictors of community rankings. UlcerIndexExplained

bantam · Jun 25, 2014

Everyone, thanks for your help. I feel like the research went well. I'm trying to get the results published in a CS conference, so forgive me for not making the data public quite yet.

I want to mention one thing that came out of the research - people have differing preferences. Some performance measures worked better than others overall, but the best is to learn at an individual level. It also seems that it doesn't take very many rankings (about 50) before you can pretty well tell which measures work best for you, personally. So, I modified the original web page and made one that gives back a report. You click 50 times, and it tells you which measures you should use. Give it a try if you're interested. If enough people do it, I might use the data to try some clustering to see if people can be grouped into categories. We'll see.

http://snake.usc.edu/rank.php