Is data mining for trading patterns impossible?

Discussion in 'Data Sets and Feeds' started by bulat, Mar 18, 2005.

  1. This may be somewhat off topic.
    I use a trading plan based on years of experience, recognition, and money management. I have realized that it is time to quantify my plan and approach into a formal 'system' which i would use as a guideline/outline to my trading. The problem i'm having is: i am programming and software illiterate and do not know of the best software to use for this purpose. Some background, i trade baskets of equities and employ several time frames. I would like to automate/program each of my plans. What would be a good starting point to go about this? I have read on how to compose a system but need to make a choice about software. I use realtick as my trading platform.

    Thanks.

    alex
     
    #61     Apr 21, 2005
  2. Hi alex,

    My experience somewhat parallels yours, except that I got plenty of software experience from a professional background.

    Something I would like to bring up is that it belongs to the characteristics of a 'good' programming language, that it makes the expression of the (well organized) thoughts of the programmer easy, i.e. it becomes kind of an extension of your brain.

    A well written program in a 'good' language will not only run on a computer but becomes a rigorous recording of your thought/reasoning processes for later reference. After you are far enough in this, your reasoning and programming will even (often) flow along together nicely.

    Now what is a 'good' programming language? Discussions about this point often resemble religious infighting or arguments about cat & dog pet qualities. Some points to watch for is a minimum amount of clutter that you have to write and carry along; the efficiency, i.e. the time required to program/debug a task; your freedom, i.e. not becoming a cash cow to be milked by devilish monopolists: look for portability and OS independence.

    This is my best advice: not an easy problem. Further having picked the 'truly best language', don't have the illusion things are going to be easy, this for two reasons:

    (1) Considerable skill is required before starting to master a new language. Even for a well experienced programmer, switching to a new languages requires (lots of) time to acquire a degree of virtuosity comparable to that he (thought to have) had in other languages;
    (2) For the inexperienced, 'conversing' with a computer is mostly a shocking experience as one discovers the very poor consistency of the brain in reasoning. This last point is the most difficult one to overcome in building computerized trading systems. You 'think' that you 'knew for sure' about certain things, but your computer 'teaching' you that things are not really that sure at all. However if you ever manage to capture a single good idea into a program that your computer is happy with (i.e. makes money in the market), that will be hard to beat.
    (3) One final note, after you went through all this, you will be perfectly convinced about the proposition that NOBODY is EVER going to sell you a proggie, data mining or other, that will truly start pumping money for you out of those deep market wells. :D

    Be good,
    nononsense
     
    #62     Apr 22, 2005
  3. mind

    mind


    well. yes. the one trade per year entry might be a little drastic ...
     
    #63     Apr 22, 2005
  4. Absolutely correct.

    Not true. You've made an excellent observation but drawn the wrong conclusion. To read about techniques for separating them out, see "Statistics for Experimenters" by Box, Box, and Hunter. (Bonus: they just came out with a new edition.)
     
    #64     Apr 22, 2005
  5. Q
    http://www.edge-fund.com/measurement.html

    http://www.edge-fund.com/Hard02.pdf

    HARDING, David, A Critique of the Sharpe Ratio
    "The most basic problem with the Sharpe ratio is that whilst return is a definite and meaningful quantity (an “observable”), risk is not. It is true that standard deviation can be calculated from any time series of return data, but it is not at all true that its “meaning” will be the same for all time series. For the standard deviation to be a meaningful statistic at all the return time series must be generated from a process that is both stationary and parametric." UQ
    :confused:
     
    #65     Apr 22, 2005
  6. I think Harding picked the wrong parameter to worry about. Over in the Mechanical Investing group at Motley Fool, people have consistently found that backtested returns do not materialize in practice, whereas backtested standard deviations are very predictive of future standard deviations.

    The main critique of Sharpe ratio is that it punishes "upside variance". That is, if your model is highly non-stationary (very high but consistent returns) that will increase the standard deviation and reduce the Sharpe ratio despite the apparent lack of risk. I think that's fine. A bias toward caution when dealing with very high return models is wise. There are attempts to address this "flaw" such as the Sortino ratio but I am deeply suspicious of them.

    Martin
     
    #66     Apr 22, 2005
  7. Yeah. :)

    I've discussed this topic a lot at the Motley Fool board I just mentioned, and they really do try to backtest strategies that trade only once a year for tax reasons (although they enter more than one position).

    Martin
     
    #67     Apr 22, 2005
  8. mind

    mind

    agreed on the upside variance. out of hindsight i always found it a pretty good estimate to think that what can happen upwards within a given period can happen south as well. applying that to standard deviation means that i prefer to be conservative and thus use sharpe. but, as i said, i am currently changing my point of view.
    i think it is good to skip sharpe for something better - but to trade i for tradestation output is no option IMHO.

    peace
     
    #68     Apr 22, 2005
  9. You bet! Good analytics are hard to find.
     
    #69     Apr 22, 2005
  10. prophet

    prophet

    That's a bit of a contradiction. The whole point of doing the statistical tests, and more importantly using a statistically significant data set (and/or a significant number of trades) is to figure out probabilities the 100 strategies will work, walking forward. Better yet, if you can automate optimization try optimizing over repeated, independent trials.

    Overoptimization comes from searching over too many degrees of freedom and not having enough data points (trade or regular-time-interval returns) to rule out chance or systems that simply don't generalize walking forward. There is a balance between these two factors. If you add more degrees of freedom, you better also test over a greater length of data and more markets (non correlating).
     
    #70     Apr 22, 2005