Trying to pick the best from unrelated factors

Discussion in 'Strategy Development' started by bobpit, Oct 18, 2008.

  1. bobpit


    I am trying to design a strategy by examining a great number of seemingly unrelated factors. This could be applied to trading systems, but it is not limited to them.

    I make (automated) tests against thousands of different factors (about 300.000 factors). Some of these factors may be: Day of week, Day of month, Above/below 30 day moving average etc. I came up with about 1000 factors, each one of them giving very descent signals. The combination of all 1000 factors gives great results.

    Now suppose I only have 6 months worth of data. I use 5 months worth of data to design the strategy (and come up with the 1000 factors). As I said, the results look great (on paper). Then I try to verify the strategy on the 1 month of data. Disaster! It looks like the strategy is designed to LOOSE.

    What steps should I take?

    Obviously I have fallen in the curve-fitting trap. But how do I get out of it?

    After reading on optimizing automated trading systems, one advice is to try to examine the big losses. Why something did not work. Well, this is not applicable in this case.

    One idea is to test individually each one of those factors against the 1 month data. IF they are profitable there (I see consistent results), THEN I can pick this factor as a profitable one. So I may come up with 300-600 factors. And then, their combination will be my system.

    Slippage and commissions are not an issue.

    Any ideas guys?
  2. bobpit


    What? Nobody can help?
  3. You need more data
  4. bobpit


    I know I need more data. But believe me. That's all I have.
  5. Way too many factors - use less inputs.

    On a more fundamental note - what exactly are you trying to accomplish? Because from your brief description, it sounds like you are actually TRYING to curve-fit. So you should be commended for achieving what you set out to do. :)
  6. heypa


    Your data is not at fault.
    Your analysis is not at fault.
    You just ran into a buzz saw of a market anomaly.
    No analysis using relatively normal times information will adequately fit todays market.
    Wait a while don't throw it away.
  7. bobpit


    I tried to pick all possible factors that may affect the result, and try to analyse them. Some of them MUST affect the result. Others are obviously irrelevant. But I have no way of knowing this.
  8. This is what I'm trying to say - you CANNOT tell which factors "must" affect the result by data-mining like this. All you can do is fit curves.

    To determine which factors "must" affect the result, you have to go back to fundamentals of market mechanics and find the causal link between the factor and the consequence. Analysis like this will not get you there.
  9. In your code to generate the best trades, you should prioritize it so that the minimum number of trade signals are generated by the system.

    prioritize for being long / short, by scalability (figure out using volume how liquid the strategy is, and by net profitability (you could also optimize the strategy for the most performance characteristics like risk-reward or consistency).

    One way to ensure you are not optimized with your strategy is to somehow code a correlation of systems to performance degradation- find out by testing 1000s of system (or system fragments) the aspects that lead to degradation, and those that don't. The ones that don't lead to degradation may have interesting properties that are worth utilizing in your more refined strategy.

    This is probably much harder to do than I am making it sound, in addition being a form of optimization of its own.
  10. bobpit



    Data mining is my only way to analyse them, right now. Cannot make sense of the different factors and logicaly determine which one really affects the market.

    So, what are my options?
    #10     Oct 19, 2008