how do u determine if something is statistically significant?

Discussion in 'Trading' started by Gordon Gekko, Jul 11, 2003.

  1. Sharp

    Sharp

    I'm trying to post the chart.
     
    #31     Jul 12, 2003
  2. Sharp

    Sharp

    Here's the chart.
     
    #32     Jul 12, 2003
  3. Sharp

    Sharp

    You may also notice that my target levels are usually around my 35 EMA. I use that as well.
     
    #33     Jul 12, 2003
  4. acrary

    acrary

    One of the problems with trading is that all the past information is a subset of some unknowable future distribution. The amount of price movement over a period of time also does not conform to a normal distribution so stats can only be used for rough estimation. I've posted this before, but I think this is a good thread to repost it.

    No matter what test you do, the trades are going to only be a sample of the ultimate distribution. If it shows 60% winners for the past 10 years, that may be the mean or only a skewed result from your tests. Here's something to stress test the sample.

    To find out estimate of error in system test sample:

    (Can be used for % win because the frequency of wins/losses is a normal distribution.)

    Error estimate

    E = (z * std. dev. of sample) / sqrt of number of samples in test

    E = Error estimate
    z = number of std. dev. of normal distribution for the confidence level needed.
    z = 3.08 = 99.8% confidence level
    z= 2.58 = 99.0% confidence level
    z=1.96 = 95.0% confidence level
    z=1.645 = 90.0% confidence level

    Ex. 50 trades in test (1 = win 0 = loss)
    sample mean = 40% winners or .40
    sample std. dev. = .25

    If we want to know the estimate of the mean to the 99% level then:

    E = (2.58 * .25) / sqrt(50)

    E = .0912

    so with 99% certainty, we know the mean winning % range is .40 +- .0912 (you can expect to see wins between 30.88% and 49.12% in the future) If it's not acceptable, either do more tests or work on a system with a tighter standard deviation of wins versus losses.

    So how many samples do we need to be 99% certain of the mean?

    n = ((z**2) * (std. dev. of sample**2)) / (( 1 - confidence level required)**2)

    n = number of tests we need to run
    z = same as above
    std. dev. of sample = std. dev. from sample size we have seen
    1 - confidence level required = how exact do we want it:
    .90 confidence = 1 - .9 or .1 for the formula
    .95 confidence = 1 - .95 or .05 for the formula
    .99 confidence = 1 - .99 or .01 for the formula
    .998 confidence = 1 - .998 or .002 for the formula

    in this case we want 99% confidence

    n = ((2.58**2)* (.25**2)) / (.01**2)

    n = (6.6564 * .0625) / .0001

    n = 4,160 tests needed to prove the mean at the 99% confidence level is really 40% winners.

    After you've done the test for win% you can also do it for win size and loss size (independently). Usually the win size will not correspond to a normal distribution. If you're cutting losses short and letting profits run, then you should have some outlier trades in the win size distribution. For the test to be valid you need to eliminate the outlier's. I've found that removing the top 5% winning trades (best 5 out of each 100), has been enough to move the distribution to a more normal bell curve.

    When you've done the tests on the win size and loss size, you'll end up with something like:

    win size mean $500 +- $100 at the 99% confidence level

    loss size mean $250 +- $50 at the 99% confidence level

    Then you compute a pessimistic expectation using the low end of win % and win size and the high end of the loss size. If it shows any profit, then you've probably got a winner (as long as it wasn't curve fit).

    Ex.
    E = (400 * .5) - (300 * .5)
    E = 50
     
    #34     Jul 12, 2003
    swinging tick likes this.
  5. bulat

    bulat

    And how can you tell if something is curve fit or not? It seems to me, that's the most important question.

    bulat
     
    #35     Jul 12, 2003
  6. Damn... acrary...

    What can I add... hrmmm...

    Ummm...

    :eek:

    ... Well, along with the Std. Dev. of the system, I do the following...

    The whole point would be to look into the characteristic of the system on a out-of-sample testing. Personally for me, I also take the 100 trade Moving Average for the statistics for the last 100 trades like %Profitability, Risk/Reward Ratio, and others to see if the trades.

    It's close to what acrary is saying. I want to see a flat line.
     
    #36     Jul 12, 2003
  7. Personally, every system... every methodology... implements some sort of curve-fitting. Eventually, you're trying to extract a certain tendency of the market that occurs frequently in your favor, which would be for profits.

    Even if you're a discretionary trader, you're curve-fitting your mentality with the market. That's what being in sync with the market is all about.

    So as long as you're basing your trade on the market, curve-fitting is inevitable.

    But... "over" curve-fitting is a big problem. Trying to take every kind of pattern within a certain data set would eventually set your mind and signals to that data set. Because of the ever changing market, optimal curve-fitting eventually loses edge.

    So... traders and system developers need to have a more "robust" curve-fitting criteria to work. One way is to use a large trading sample. Another would be keep every thing simple and flexible.

    There are other techniques that let's you un-"over" curve fit, but I think you get the idea...
     
    #37     Jul 12, 2003
  8. a video to get the intuition of it that is to say don't need to reach a size of 30 in practice for a sample - but it must be compensated by the number of samples as I said.

    http://harrytrader.membres.jexiste.org/theoreme_central_limite.avi

    See comment here and also for another video:
    http://www.elitetrader.com/vb/showthread.php?s=&postid=290994#post290994

    here a picture for size=2 for the second law it's really bad but see the video for what happens when size is 5 or above.
    <IMG SRC=http://harrytrader.membres.jexiste.org/central_theorem_limit.gif>

    REMARK THAT THE LAW OF INDIVIDUAL CAN BE ANYTHING DIFFERENT FROM NORMAL LAW IT DOESN'T PREVENT THE SO CALLED CENTRAL LIMIT THEOREM TO BE TRUE.

    P.S.: hurry to download it because it's 5 Mo taken on my web server and I lack space so I should erase it before Monday.

    ALSO DON'T PLAY THE VIDEO ONLINE YOU CAN GET AN ERROR SO DOWNLOAD IT BEFORE.

    P.S.2: If you have the latest windows media player you should have no problem but if problem go at http://www.techsmith.com/download/studiodefault.asp to download the TSCC Codec.




     
    #38     Jul 13, 2003
  9. nitro

    nitro

    Another way of saying the same thing is that the series is not stationary, or I(0). In other words, the "rules" that generate the time series change over time, and these rule changes can occur without any means of detecting the change using statistical or other (known) measures.

    The normal distribution is not terrible at modeling the 97 % of the curve at the "hump." It is at the tails that where it is really off. Some modelers then use the (log) normal distribution to model the hump, and the Pareto distribution to model the tails.

    I am not sure that the markets are following "one" ultimate" distribution. It may be a linear combination of them, or even a nonlinear combination of them. I supposed the whole of them can be thought of as one "grand" distribution.

    The t-distribution may be more appropriate with such small number of samples...

    This is a nice post. I am not a system trader, but the "principles" should apply to my "equity curve."

    nitro
     
    #39     Aug 28, 2003
  10. NinjaTrader_Dierk

    NinjaTrader_Dierk ET Sponsor

    @acrary
    Great post, I enjoyed it. Some comments ...

    I'm not mathematician, but I've read: The "Central Limit Theorem" states, that increasing sample size will decrease significantly the error which results from deviatian of the current sample from a normal distribution. In fact, that's not hard to see and understand. But practice says, that this error will have little significance having a sample size of > 30. Any comments on this ?


    The idea to drop extermes result is great. I use this method when running optimizations to hopefully not get trapped by extreme situation.

    The chance of outliers (in dollars) can be reduced by doing calculation in percent values. Thus, e.g. chances in market levels don't have any effect on numbers.

    Hope, this added some value.

    Dierk
     
    #40     Aug 28, 2003