How to rate / compare strategy performance

Discussion in 'Strategy Development' started by psheridan050, Apr 28, 2008.

  1. Hi all,

    How do you compare or rate your strategies against each other when backtesting? It seems to me you would have to take into account the amount of trades made, time span of the backtest, annualized ROI, draw down, etc. to get the answer.
  2. MGJ


    Some people invent "performance measurement statistics" that attempt to quantify the desirability or goodness or superiority of a set of backtest results. You could have a look at some of the ones that have already been invented and named. Search engines will help you find
    • Sharpe Ratio
    • Ulcer Index
    • Kestner K-Ratio
    • Return Retracement Ratio
    • Seykota Lake Ratio
    • Sortino Ratio
    • MAR Ratio
    • Length of longest drawdown (in months)
    • Depth of deepest drawdown (in %)
    • Profit Factor
    • Compound Annual Growth Rate (in % per year)
    • Treynor Ratio
    • Semideviation Sharpe Ratio
    • Robust Sharpe Ratio
    • Monte Carlo Median MAR Ratio
    There are a lot of them because they each take a different view of what is desirable and what is undesirable. (Most of them are ratios of (pleasure) / (pain)). Everyone, it seems, has a slightly different view of what constitutes pleasure and pain in a backtest. So traders and trading software developers keep inventing new performance measurement statistics with different types of pleasure and pain in the computation.
  3. jhend746


    what are you trying to achieve? Are you looking for 1 ultimate strategy or a few complimentary strategies? If you're looking to compare two of the same strategy such as mean reversion, then you could compare the two by using a 2 sample t test of average profit per trade (in this case your profits should not be a function of equity) or preferably winning percent because the distribution will be normalized.

    If you're comparing two different types of systems each other, then you need to decide in advance your criteria for selection. Prioritize system characteristics and use performance measures that incorporate these characteristics (apriori, rate desirability of a characteristic with a number from 1-10. Then rate each system's adherance to those traits with a number from 1-10. multiply the two number for each system and add up the points to decide on the better system.

    If I really like 2 systems and my performance measures are insignificantly different, then I usually choose the one with less exposure.
  4. kut2k2


    Amazingly--and amusingly--the one performance stat that is almost never mentioned is the most important one: the buy-and-hold index (BHI).

    Think about it. What's the point of trading? To beat the market. Making a profit means little if you're still doing worse than the old granny strategy of simply buying and holding on tight. I see some ET posters brag about how profitable their strategies are without ever mentioning how they're doing compared to the baseline: the buy-and-hold profitability. Which makes me believe they aren't beating the market (the buy-and-hold).

    All other performance numbers are window dressing. The bottom line is your BHI.
  5. MGJ


    I don't think this is true if you are trading Forex or futures. Our OP doesn't say what instruments he is backtesting.
  6. jhend746


    I'm assuming that the person writing the system is competent enough to figure out if their system is even worth trading. The question is dealing with two systems. Buy and hold is not the be-all-end-all for performance measurements. What if your system earned less than the market, but was less risky? It all depends on what on your performance:risk profile and the many ways risk manifests itself. just to see if your system is worth trading you need to know if it significantly beats a random system of the same markets with the same holding period. This is what you should know before testing 2 systems against each other.
  7. Sorry took so long for me to post back, I work the nights. Anyhow, I am referring to backtests for stocks only and each of the strategies compared may share similarities to each other or be completely unique amongst the group. For the purposes of this discussion, let’s not worry about if a strategy beats the market or if it is even profitable. Let’s concentrate on comparing one or more strategies against each other. If all the strategies suck, which one sucks the least?

    The main problem here is we are looking at strategies that aren’t necessarily an apples to apples comparison. Certainly things like average profit per trade and maximum drawdown are easy to compare. But what if the backtests cover different spans of time (strategy A backtests was for 4 years while strategy B was backtested for 2 year)? Or if they don’t share the same number of trades (strategy A backtests had 600 trades while strategy B had 200 trades)? One possible solution to this that I know of is to annualize the performance measures such as ROI.

    Here are a couple things concerning backtest that I think might be important to compare. MGJ also listed a few interesting items in his first post:

    • Correlation of the equity curve to perfect profit
    • Average drawdown per trade
    • Profit factor
    • Number of trades conducted
    • Span of time the backtest covers

    I have the feeling that there isn’t a Holy Grail fitness test I can apply to the performance of a strategy’s backtest and just use that to compare them. My instinct tells me that I will need a few different formulas to represent the good and bad of a particular backtest. Narrowing down which measures to use is the challenge.

    One application I would like to do is to make my trading system (my group of strategies) adaptive. This is a simplified description, but by constantly reviewing how a trading strategy measures against one another, I can determine which particular strategies should have trading dollars applied to them. This type of system would lag the market somewhat, but in the end I think it would be of benefit or at least an interesting experiment. Also, all my trading is mechanical with little human intervention.