Curve fitting

Discussion in 'Automated Trading' started by boza, Dec 13, 2017.

  1. Simples

    Simples

    Let's understand "curve fitting" and "overfitting". Say you write an algo that automatically writes Shakespeare's "Romeo and Juliet". This was your goal for the algo, and it does so with minimal input and parameters, thus can also be thought of as compression algorithm of sorts: Ie. "the knowledge" of how to recreate the perfect response is embedded in your algo. But is this "enough" for all possible use cases?

    Later, when you continue to run the algo, would you expect it to successfully write the sequel: "Romeo and Juliet II"? If not, maybe the process of your algo is different from the process of Shapespeare (the market)?

    So "curve fitting" or "overfitting", fits your algo's output to your goals, but doesn't really achieve longer term goals like future profitability, because they might fail miserably going forward out of sample. Indeed, there may be many reasons it may fail, both due to the method (algo) but also because of external factors (Shakespeare's death / fundamental market changes). Some things are just impossible to perfect, but maybe "good enough" will do?

    Say you failed out of sample, and spend 100 attempts at making your algo. Your "out of sample" is now "in sample" due to your very process of creating one algo iteratively. So the algo can still fail going forward into the future, even though performing wonderfully both in sample and out of sample!

    Because the nature of backtesting and optimizing is to "curve fit", performance is likely to be less going forward. There's nothing wrong with "curve fitting" per se, but when results fail, if it's because of "overfitting", then it was bad. Though, you might also see short-term performance exceed backtesting as well (unf.likely), the random nature of the markets makes it hard to see what works and why.
     
    Last edited: Dec 31, 2017
    #31     Dec 31, 2017
    SimpleMeLike likes this.
  2. Mysteron

    Mysteron

    Basically, the problem is one of System Identification for which there is a vast literature in control systems and econometrics. If you really want to have an answer to your question then you need to be asking people who have understanding of system modelling, developing, testing, performance analysis. You are unlikely to get that on these forums.
     
    #32     Dec 31, 2017
  3. Thank you Simple for your explanation.

    This is a good explanation.
     
    #33     Jan 1, 2018