System Performance Score

Discussion in 'Automated Trading' started by kut2k2, Feb 28, 2013.

Thread Status:
Not open for further replies.
  1. kut2k2

    kut2k2

    Here is the final version of the System Performance Score (SPS). The previous versions suffered from too much focus on the NOBF , which it turns out is entirely unnecessary.

    SPS = (p*(W/L) - q)*min[1, N/1000] ,
    where
    p is the winrate,
    W is the average winning trade return (%),
    L is the average losing trade return (%),
    q is 1-p,
    N is the number of trades in the backtest.

    Now let's revisit the systems listed by nonlinear5 :

    System A : [+10, +10, +10, +10]
    System B : [+20, +20, +20, +20]

    SPS(A) = (1(10/0) - 0) = infinity, as expected.
    SPS(B) = (1(20/0) - 0) = infinity, as expected.

    System C : [+10, -5, +10, -5]
    System D : [+20, -10, +20, -10]

    SPS(C) = (.5(10/5) - .5) = .5;
    SPS(D) = (.5(20/10) - .5) = .5 = SPS(C), as expected.

    System E : [-8, -8, -8, +25, +25]
    System F : [-1, +1, +1, +21, +50]

    SPS(E) = (.4(25/8) - .6) = .65,
    SPS(F) = (.8(18.25/1) - .2) = 14.4

    :cool:
     
    #21     Mar 21, 2013
  2. gip3

    gip3 Guest

    Um... sharpe ratio gives you the same EXACT ranking...

    System A : SR[+10, +10, +10, +10] = +Inf
    System B : SR[+20, +20, +20, +20] = +Inf

    System C : SR[+10, -5, +10, -5] = 0.289
    System D : SR[+20, -10, +20, -10] = 0.289


    System E : SR[-8, -8, -8, +25, +25] = 0.29
    System F : SR[-1, +1, +1, +21, +50] = 0.66

    Where SR(x) = avg(x) / std(x)

    If you want to use the 'number of trades' idea, then you can just do Sqrt(N)*SR(x). The squared root operator has a little bit of root in significance testing, rather than just your linear multiplier.

    So... what exactly is the contribution of your SPS? Seems unnecessary from the examples you gave.

     
    #22     Mar 21, 2013
  3. Let me code this up tonite in Excel and check it out.

    I don't see the problem with a risk free rate of return being assumed at 1% for the past 5 years and likely into the future as well. If it varied wildly, like back in the 80's, then it would be significant.
     
    #23     Mar 21, 2013
  4. Your "version 3" of SPS is definitely better than the previous two versions. However, upon the analysis, it turns out to be a butchered version of the Kelly Criterion:

    Kelly = p - q / (W/L)
    SPS = Kelly * (W/L) = (p - q / (W/L)) * (W/L) = p * (W/L) - q

    So, in effect, your "final SPS" is simply the "classic" Kelly multiplied by W/L. The advantage of this additional multiplication is dubious to me, since the term (W/L) is already present in the Kelly.

    With regards to the min[1, N/1000] multiplier, I agree with gip3 who commented above. That is, sqrt(N) is better than min[1, N/1000]. It's more continuous, and more statistically valid. That's because as N grows larger, the standard error scales as [1 / sqrt(N)]. For example, consider two systems, A and B. Between these two systems, everything is exactly the same except that system A made 10,000 trades, and system B made 15,000 trades. Your multiplier would then rate system B as 1.5 times better than system A, whereas statistically (and again, by common sense), system B is not that much better. How much better then? The answer is sqrt(15000) / sqrt(10000), which is about 1.22.
     
    #24     Mar 21, 2013
  5. kut2k2

    kut2k2

    If you feel that system F is only a bit more than twice better than system E, then by all means use your measure.

    My gut tells me that my numbers make more sense than yours do.
     
    #25     Mar 21, 2013
  6. kut2k2

    kut2k2

    What you posted isn't the Kelly fraction. Once again, go read the "Bad Kelly" thread in Trade Management to find out why.
     
    #26     Mar 21, 2013
  7. gip3

    gip3 Guest

    There's a lot of work backing up the use of sharpe ratio. It's also immediately relateable to statistical significance testing. Sure, it's got short comings, but we can analyze those short comings pretty well within a statistical framework (ie, what's the exact impact of kurtosis on the ability of sharpe to miscapture performance).

    Your score has no theory to back up what it's measuring. In fact, it decomposes the trading outcome to a binary world: wins and losses. So I'm entirely at a loss as to why you would find it better? It makes a far more restrictive set of assumptions on the outcome of a trade, and there's no underlying rationale why your variables are laid out that way.
     
    #27     Mar 21, 2013
  8. kut2k2

    kut2k2

    You've misinterpreted the count parameter, which has an upper limit by the way, unlike sqrt(N)..

    The point of min[1, N/1000] is to penalize SPS values that are based on an insufficient number of trades, i.e., below 1000 trades. No need for a square root version because the standard deviation is not a factor in the SPS.
     
    #28     Mar 21, 2013
  9. Your "final SPS" is nothing but what you call "Bad Kelly" multiplied by W/L and adjusted for the number of trades. That's it. Again, this is a dubious "improvement". It can be easily shown that your formula overrates W/L. I'll leave the verification up to you.
     
    #29     Mar 21, 2013
  10. For N >= 1000, min[1, N/1000] always results in 1.

    So, it does not make any distinction between system A with 1,000 trades and system B with 10,000 trades. That's way too discontinuous, not to mention that the boundary of "1,000 trades" is too arbitrary.
     
    #30     Mar 21, 2013
Thread Status:
Not open for further replies.