Is data mining for trading patterns impossible?

Discussion in 'Data Sets and Feeds' started by bulat, Mar 18, 2005.

  1. bulat

    bulat

    That's true, but it brings up a whole new set of problems. Bonferroni correction simply requires an ever more stringent level of statistical significance as the number of hypotheses tested goes up. But if you are doing large scale data mining, where you are searching billions of patterns, then the only system that would ever pass a significance test with Bonferroni correction is one that makes virtually astronomical profits. So even if there are valid patterns in the data, they would never pass your test.

    In other words you go from a Type I error (mistaking random patterns for meaningfull ones) to a Type II error (mistaking meaningful patterns for random ones) because the test becomes an impossible hurdle to jump over for any pattern real or random.

    -bulat
     
    #51     Apr 20, 2005
  2. bulat

    bulat

    You are obviously correct that it is pure curve fit, since the system performs poorly on all the out of sample data (both before and after) the test period that was posted.

    -bulat
     
    #52     Apr 20, 2005
  3. Well here you are again. And again you drive by the intersection where you should have turned. Patterns don't work because the underlying data series is random a significant part of the time.

    If instead of trying to find a pattern within random data, you looked for periods when the data is not random (and then find recurring patterns), you would have a chance to make money.

    Some of you must have the ability for abstract thought. Go get yourself a copy of "The mathematics of technical analysis" by Cliff Sherry. Do the exercises and figure it out. Sheesh.

    Ordinarily I would wish you "good luck". Instead I will just hope you learn to identify profitable patterns within a non-random data series before your money runs out.

    :D

    Edit:

    Please no PMs. This is the basic fucking Stats & Probabilities from school . Acutally it is taught in the 200 level or second half of the first year of statistics.
     
    #53     Apr 20, 2005
  4. bulat

    bulat

    As much as I enjoy feedback from people without a clue, yet a feeling of great self-importance, please don't post to this thread if you intend to be rude while adding absolutely nothing useful to the discussion.

    Why would anyone PM you about this? It's pretty obvious that you have nothing useful to share here.
     
    #54     Apr 20, 2005
  5. It is possible
    http://www.trade-ideas.com/Help.html#GBBOT

    and works very well
     
    #55     Apr 20, 2005
  6. If your data mining method results in no significant results, you should look for a different data mining method rather than relaxing your statistical requirements. For example, do not test billions of possible models. This is a recipe for failure. There are alternatives. Develop strategies incrementally rather than doing an exhaustive search. Look for a smooth local parameter space where similar strategies give similar results. Use seperate testing and validation datasets. Ther are many possibilities. But, you ignore the multiple hypothesis problem at your peril.

    As I said, human intuition is subject to the same flaw as the t-test. We want to believe that our results are more significant than they actually are. This is why the vast majority of backtested trading strategies underperform when they are tested post discovery.

    Martin
     
    #56     Apr 20, 2005
  7. would you rather ride in a car driven by a 16 year old kid, or by software written by the greatest minds?

    i dont think you can really beat developing your best judgement, and looking at everything in every situation
     
    #57     Apr 20, 2005
  8. mind

    mind

    after having witnessed and participated in a number of discussions similar to this one, after having traded several quantitative strategies with several million dollars quite successfully for some time before they started to loose edge, after having listened to different sorts of traders, who i think do well yet are on very different levels of sophistication, i come to the conclusion that we are taking a very important variable out of the equation: our own neural net and our own consciousness.

    the posts within this thread indicate it very well IMHO. it is not necessarily the criteria that decides whether we make it or not. it is the way, intensity and decisiveness of our search. if alan did well with his way of searchin it was probably because he knew so much about the market and tested so many, many setups, that he finally found tradeable setups within the randomness and could tell by his experience, and found some way to "prove" this "experience".

    i always used sharpe ratio as my main criteria to tell about validity of an approach. now i tend to thnk that the number of trade is very important. if i have 4000 days and i trade on a third of them by entering at the open and getting out at the close, and i use just two or three parameters, i am very confident that there is "something", even if my sharpe ratio is below 1.
     
    #58     Apr 21, 2005
  9. mind

    mind

    #59     Apr 21, 2005
  10. Of course the number of trades is important. Once again it is a matter of statistical validity. If a system gives you one entry a year which you hold for a full year, and at the end of the year you are ahead of the market, you literally cannot draw any statistically valid conclusion about your system's real world performance. If you make 1000 trades a year with a 60% win/loss ratio, you know with extremely high confidence that your system did not get those results by chance. This is known in statistics as the law of large numbers.

    Martin
     
    #60     Apr 21, 2005