Optimization, curve-fitting and probability

Discussion in 'Strategy Building' started by NinjaTrader_Dierk, Aug 27, 2003.

  1. The more systems you build, the fewer optimization steps it takes to find a winner.
     
    #11     Aug 27, 2003
  2. DT-waw

    DT-waw

    droth,

    Read 'Optimization, The Double-Edged Sword'
    Quote from the article (page 3):

    "One of my observations over years of strategy development is that profitability of a strategy is inversely proportional to its complexity. Keeping this rule in mind, you should avoid too many signals. Each additional signal you add to the strategy increases the possibility that all of the signals together, in combination, are curve-fitted for the particular historical data. So keep the number of signals in your strategy to a minimum to assure that in combination they are not over-optimized on the data"


    I also really like the following statement written by vegasoul:

    "A robust system makes very little assumptions about the nature of the market environment and use only very universal and general rules in its design"

    The length of backtesting period is equally important. The idea is to backtest as_simple_as_possible systems on as_long_as_possible period. Doing the opposite increases the process of curve-fitting.
     
    #12     Aug 27, 2003
  3. maxpi

    maxpi

    You are going to take over for Jack someday
     
    #13     Aug 27, 2003
  4. Thanks for your good suggestion! :D

    But without Jack, I'm diminishing. :mad:

    Hmm, wait a minute. Are you implying Jack=Odd? :confused:
     
    #14     Aug 27, 2003
  5. >Katz/McCormick focused me again on the t-test.
    >Scenario:
    >The in-sample test of a strategy in this book resulted in
    > - #trades:
    > 118

    The t-test is normally used for small sample although it's ok for big sample also of course since student law converges towards normal law. But each (individual) variable of the sample should follow normal law of same variance and must be independant. In some phenomenas the hypothesis are rather reasonable (if you mesure a table with the same instrument and the same protocol it is reasonable to assume that error is due to randomness) but not in unknown phenomenas. As I use to say checking the validity of premisces is much more important than making calculus of the significance of the test since this significance has no significance at all if the basic hypothesis is not checked.

    >The statistical significance is adjusted: 1 - power(1-0.00184, 20) >= 0.3103 = 31.03%
    >(Detail: Can anybody explain the concept between the last >step? It's somewhat unclear to me.)
    Supposing that the premisces of the student test (more generally of a parametric test) are reasonable (which is not evident as said above but let's do as if they were) this says that if "the (true but unknown) mean was equal to 0" (called null hypothesis H0) the probability would be alpha = 0.00184. The contrary of H0 is "the (true but unknown) mean wasn't equal to 0". Then the basic axiom of probability says that Prob(H0) + Prob (non H0) = 1 (since 1 is the probability of certainty :) ) so that Prob (non H0) = 1 - Prob(H0) = 1-0.00184. This is called the significance of the test.
    After that if an experience E is repeated with independancy, etc., the probability of P(E1*E2*E3*..*En)=P(E1)*P(E2)... this explains power(1-0.00184, 20). Substracted from 1 give the significance.

    >Now my idea:
    >Improving statistical significance could be target of an >optimization process. I could make sense to not only focus on
    >good past performance but also (or only!!) look for a maximum >statistical significance. This might be a way to overcome
    >(or make at least less hurting) the problem of over-optimization >and curve-fitting.
    "your" idea is already the foundation of statistical decision theory : The theory of statistical estimator is based on the concept of efficiency like in stock market. Nevertheless efficiency here is not as fuzzy :D. In statiscal theory optimality of estimation of a parameter means 3 things: consistancy (when the size of the sample grows, the estimator must converge towards the true parameter), no bias (for example the empirical standard deviation (of the sample) is a biased estimation of the true standard deviation (of the population) so one must correct by square-root of (n-1)), and efficiency which the smallest variance (if you vary the parameter this variance will also vary so that there is an optimum). The best accepted method to find this optimal is the maximum likelyhood which uses the formula above of multiplication of independant probability to find the roots of partial differential equations which will give the optimal value parameters. THE PROBLEM IS THE PROBABILITY LAW MUST BE KNOWN. This is common sense: you can't create knowledge from thin air :D

     
    #15     Aug 31, 2003
  6. Dear bdixon,

    You gave away the big secret.

    Let me restate it my way: "If you're still optimizing, you're not there yet."

    Be good,

    nononsense
     
    #16     Aug 31, 2003
  7. Dear Harrytrader,

    It is good you state this once again. A lot of confusion results in some threads from people acting smart while not really knowing what they are talking about.

    Be good,

    nononsense

    P.S. You have very neat looking graphs!
     
    #17     Aug 31, 2003
  8. Yeah, exactly. Like you tooting around on the scalping thread that you're doing roughly 500 trades per day, amongst other BS all over ET.
    No offence, it just crossed my mind.

    Be good,

    Scientist
     
    #18     Aug 31, 2003
  9. droth

    I mentioned in another post George Pruitt's article in August Active Trader magazine on systems. It might save you some time. It may be available online.
     
    #19     Aug 31, 2003
  10. You big fool, learn how to read first before BS'ing around here. Go get some more lessons at bubba's.

    Be good,

    nononsense
     
    #20     Aug 31, 2003