Is data mining for trading patterns impossible?

Boy Plunger · Apr 21, 2005

This may be somewhat off topic.
I use a trading plan based on years of experience, recognition, and money management. I have realized that it is time to quantify my plan and approach into a formal 'system' which i would use as a guideline/outline to my trading. The problem i'm having is: i am programming and software illiterate and do not know of the best software to use for this purpose. Some background, i trade baskets of equities and employ several time frames. I would like to automate/program each of my plans. What would be a good starting point to go about this? I have read on how to compose a system but need to make a choice about software. I use realtick as my trading platform.

Thanks.

alex

nononsense · Apr 22, 2005

Quote from Cluseau:

This may be somewhat off topic.
I use a trading plan based on years of experience, recognition, and money management. I have realized that it is time to quantify my plan and approach into a formal 'system' which i would use as a guideline/outline to my trading. The problem i'm having is: i am programming and software illiterate and do not know of the best software to use for this purpose. Some background, i trade baskets of equities and employ several time frames. I would like to automate/program each of my plans. What would be a good starting point to go about this? I have read on how to compose a system but need to make a choice about software. I use realtick as my trading platform.

Thanks.

alex
More...

Hi alex,

My experience somewhat parallels yours, except that I got plenty of software experience from a professional background.

Something I would like to bring up is that it belongs to the characteristics of a 'good' programming language, that it makes the expression of the (well organized) thoughts of the programmer easy, i.e. it becomes kind of an extension of your brain.

A well written program in a 'good' language will not only run on a computer but becomes a rigorous recording of your thought/reasoning processes for later reference. After you are far enough in this, your reasoning and programming will even (often) flow along together nicely.

Now what is a 'good' programming language? Discussions about this point often resemble religious infighting or arguments about cat & dog pet qualities. Some points to watch for is a minimum amount of clutter that you have to write and carry along; the efficiency, i.e. the time required to program/debug a task; your freedom, i.e. not becoming a cash cow to be milked by devilish monopolists: look for portability and OS independence.

This is my best advice: not an easy problem. Further having picked the 'truly best language', don't have the illusion things are going to be easy, this for two reasons:

(1) Considerable skill is required before starting to master a new language. Even for a well experienced programmer, switching to a new languages requires (lots of) time to acquire a degree of virtuosity comparable to that he (thought to have) had in other languages;
(2) For the inexperienced, 'conversing' with a computer is mostly a shocking experience as one discovers the very poor consistency of the brain in reasoning. This last point is the most difficult one to overcome in building computerized trading systems. You 'think' that you 'knew for sure' about certain things, but your computer 'teaching' you that things are not really that sure at all. However if you ever manage to capture a single good idea into a program that your computer is happy with (i.e. makes money in the market), that will be hard to beat.
(3) One final note, after you went through all this, you will be perfectly convinced about the proposition that NOBODY is EVER going to sell you a proggie, data mining or other, that will truly start pumping money for you out of those deep market wells.

Be good,
nononsense

mind · Apr 22, 2005

Quote from Sparohok:

Of course the number of trades is important. Once again it is a matter of statistical validity. If a system gives you one entry a year which you hold for a full year, and at the end of the year you are ahead of the market, you literally cannot draw any statistically valid conclusion about your system's real world performance. If you make 1000 trades a year with a 60% win/loss ratio, you know with extremely high confidence that your system did not get those results by chance. This is known in statistics as the law of large numbers.

Martin
More...

well. yes. the one trade per year entry might be a little drastic ...

PetaDollar · Apr 22, 2005

Quote from bulat:

So even if there are meaningful patterns that your search discovers, they will be intermingled with numerous patterns that work simply by chance.

More...

Absolutely correct.

And there is absolutely no way to actually separate them out.

More...

Not true. You've made an excellent observation but drawn the wrong conclusion. To read about techniques for separating them out, see "Statistics for Experimenters" by Box, Box, and Hunter. (Bonus: they just came out with a new edition.)

OddTrader · Apr 22, 2005

Quote from mind:

i always used sharpe ratio as my main criteria to tell about validity of an approach.
More...

Q
http://www.edge-fund.com/measurement.html

http://www.edge-fund.com/Hard02.pdf

HARDING, David, A Critique of the Sharpe Ratio
"The most basic problem with the Sharpe ratio is that whilst return is a definite and meaningful quantity (an âobservableâ), risk is not. It is true that standard deviation can be calculated from any time series of return data, but it is not at all true that its âmeaningâ will be the same for all time series. For the standard deviation to be a meaningful statistic at all the return time series must be generated from a process that is both stationary and parametric." UQ

Sparohok · Apr 22, 2005

Quote from OddTrader:

It is true that standard deviation can be calculated from any time series of return data, but it is not at all true that its âmeaningâ will be the same for all time series.
More...

I think Harding picked the wrong parameter to worry about. Over in the Mechanical Investing group at Motley Fool, people have consistently found that backtested returns do not materialize in practice, whereas backtested standard deviations are very predictive of future standard deviations.

The main critique of Sharpe ratio is that it punishes "upside variance". That is, if your model is highly non-stationary (very high but consistent returns) that will increase the standard deviation and reduce the Sharpe ratio despite the apparent lack of risk. I think that's fine. A bias toward caution when dealing with very high return models is wise. There are attempts to address this "flaw" such as the Sortino ratio but I am deeply suspicious of them.

Martin

Sparohok · Apr 22, 2005

Quote from mind:

well. yes. the one trade per year entry might be a little drastic ...
More...

Yeah.

I've discussed this topic a lot at the Motley Fool board I just mentioned, and they really do try to backtest strategies that trade only once a year for tax reasons (although they enter more than one position).

Martin

mind · Apr 22, 2005

agreed on the upside variance. out of hindsight i always found it a pretty good estimate to think that what can happen upwards within a given period can happen south as well. applying that to standard deviation means that i prefer to be conservative and thus use sharpe. but, as i said, i am currently changing my point of view.
i think it is good to skip sharpe for something better - but to trade i for tradestation output is no option IMHO.

peace

Diamondtrim · Apr 22, 2005

Quote from mind:

really? when i saw the site i thought it was crap. you are trading based on this thinking?
More...

You bet! Good analytics are hard to find.

prophet · Apr 22, 2005

Quote from bulat:

If you have a particular idea about some market behavior, then code it without doing extensive optimization, and find that it produces outstanding returns while passing various statistical significance tests, you almost certainly have a winner on your hands.

If you evaluate 1,000,000 random strategies and find that 100 of them produce good returns and pass all the statistical tests, you don't actually know which (if any) of the 100 strategies really work, and which produce good results purely by chance.
More...

That's a bit of a contradiction. The whole point of doing the statistical tests, and more importantly using a statistically significant data set (and/or a significant number of trades) is to figure out probabilities the 100 strategies will work, walking forward. Better yet, if you can automate optimization try optimizing over repeated, independent trials.

Overoptimization comes from searching over too many degrees of freedom and not having enough data points (trade or regular-time-interval returns) to rule out chance or systems that simply don't generalize walking forward. There is a balance between these two factors. If you add more degrees of freedom, you better also test over a greater length of data and more markets (non correlating).