Is data mining for trading patterns impossible?

bulat · Mar 22, 2005

Quote from QQQShort:

Personally, I prefer the odds offered by evaluating 1,000,000 random strategies compared to the handful that could be created manually in the same time.
More...

If you have a particular idea about some market behavior, then code it without doing extensive optimization, and find that it produces outstanding returns while passing various statistical significance tests, you almost certainly have a winner on your hands.

If you evaluate 1,000,000 random strategies and find that 100 of them produce good returns and pass all the statistical tests, you don't actually know which (if any) of the 100 strategies really work, and which produce good results purely by chance.

vikana · Mar 22, 2005

The trick, I've found, is to impose a few semantic rules on top on the brute force pattern search. This can be a very effecient way to find patterns that would have escaped you otherwise, while eliminating meaningless (or simply invalid) patterns.

mhashe · Mar 22, 2005

Quote from vikana:

The trick, I've found, is to impose a few semantic rules on top on the brute force pattern search. This can be a very effecient way to find patterns that would have escaped you otherwise, while eliminating meaningless (or simply invalid) patterns.
More...

Would you care to share an example of what you mean by "semantic rules" ? Thanks.

QQQShort · Mar 22, 2005

Quote from bulat:

If you have a particular idea about some market behavior, then code it without doing extensive optimization, and find that it produces outstanding returns while passing various statistical significance tests, you almost certainly have a winner on your hands.

If you evaluate 1,000,000 random strategies and find that 100 of them produce good returns and pass all the statistical tests, you don't actually know which (if any) of the 100 strategies really work, and which produce good results purely by chance.
More...

My pre-conceived ideas about market behavior are often wrong. This is why I let the computer do the research for me.

My searches are not optimization; I rarely use that technique. Instead, I search a broad variety of ideas using several years of data, most often for 100 - 2000 stocks.

vikana · Mar 23, 2005

Quote from mhashe:

Would you care to share an example of what you mean by "semantic rules" ? Thanks.
More...

Sure. Good examples include ensuring that the Low < High, and similar (obvious) relationships.

For slightly more interesting rules, would be a layer of logic trying to guide the rule engine to search for certain types of patterns:
- how many days of data to look back
- which prices to use (Open and Close for instance)
- insistence on certain "common sense" things, such as volume > 500k/day
- the types of period in MAs you are interested in. It e.g. makes little sense looking at a 5 day MA for a long term system.
- exclude relationships that you don't consider valid. E.g. relationships between Corn and the S&P.

Hope this helps

mhashe · Mar 23, 2005

Quote from vikana:

Sure. Good examples include ensuring that the Low < High, and similar (obvious) relationships.

For slightly more interesting rules, would be a layer of logic trying to guide the rule engine to search for certain types of patterns:
- how many days of data to look back
- which prices to use (Open and Close for instance)
- insistence on certain "common sense" things, such as volume > 500k/day
- the types of period in MAs you are interested in. It e.g. makes little sense looking at a 5 day MA for a long term system.
- exclude relationships that you don't consider valid. E.g. relationships between Corn and the S&P.

Hope this helps
More...

So why not move into the realm of AI. Have you looked at the openmind project? ( http://commonsense.media.mit.edu/cgi-bin/search.cgi ), you're thinking along the same lines. I've been trying to read up on AI, but my intellect defeats me.

winter · Mar 23, 2005

Great discussion.

I took a look at the OpenMind site, very interesting but I don't see any applicability in its present form to system development. Its still very primative and what they are trying to accomplish is to just get the program to understand common sense ideas that humans take for granted. They are trying to solve a much bigger problem then traders are and since its bleeding edge, I don't think its productive to go down that path for system development.

As powerful as computers are, they still pale in comparison to so many things human brains can do. Consider that the best chess playing programs have just recently started beating the best human players, and chess is such a narrow, well-defined problem, one that computers are well-suited for.

Having computer programs that think & learn in the way we do is still far off. In terms of trading I would focus on limiting searches to things that make sense which basically limit the field of search to raw data, derived indicators and operations that you think could have some merit.

If a trading system search engine discovered (through proper backtesting) a profitable system that involved using the phase of the moon, the day of the week and the number of times a block size of exactly 77 traded that day, how comfortable would you feel putting real money on such a system?

Like someone else said, if you test 100,000,000 combinations and find 10,000 that are profitable and then test those 10,000 again using a different data set and find 100 that are still profitable, how likely is it that many of those 100 just happen to backtest well through both data sets and really have no meaningful edge whatsoever?

It reminds me of the old addage about if you have a million monkeys pounding on keyboards sooner or later, strictly by chance, one of them will produce the works of Shakespeare.

So far I have been sticking to the approach of coming up with ideas myself and giving my optimization/backtesting framework a wide berth in terms of number of parameters to the point that parts of the system can be disabled if the optimization testing finds that disabling a particular parameter/filter produces optimal results.

Check out http://www.stratasearch.com as well, I haven't really used it but it may appeal to some who are looking for that approach.

Lawrence Chan · Mar 23, 2005

Back in my old days working for a fund, data mining is an everyday task.

2 findings are very discouraging to most people,

1. continuously recurring patterns do exist and very persistent in consistencies over time - BUT, their bias in price movement prediction is pretty small.

2. very explosive patterns usually do not last - best for about 10 years. Most of them last about a year or 2 only.

I am taking about intraday data. At the time of conducting these researches the computing power is more limited comparing to what we have today (pre-Pentium time). Thus the patterns exploited are at least 10 mins or higher time frame.

This makes a lot of sense, both statistically and in common sense-
Patterns in #2 can be recognized easily by visual inspection, simply rules, etc. thus can be discovered easily by the market participants at the time, and have the patterns discounted.

Patterns in #1 are discarded or ranked out when comparing to those in #2, and, due to their less profitable nature.

Diamondtrim · Mar 23, 2005

Quote from vikana:

Sure. Good examples include ensuring that the Low < High, and similar (obvious) relationships.

For slightly more interesting rules, would be a layer of logic trying to guide the rule engine to search for certain types of patterns:
- how many days of data to look back
- which prices to use (Open and Close for instance)
- insistence on certain "common sense" things, such as volume > 500k/day
- the types of period in MAs you are interested in. It e.g. makes little sense looking at a 5 day MA for a long term system.
- exclude relationships that you don't consider valid. E.g. relationships between Corn and the S&P.

Hope this helps
More...

I have to agree with Hank. Your examples backs up his post about TI. http://www.trade-ideas.com/Help.html#configure_window_specific_filters

Those are the settings I use to make the patterns more valid.

vikana · Mar 23, 2005

Quote from Diamondtrim:

I have to agree with Hank. Your examples backs up his post about TI. http://www.trade-ideas.com/Help.html#configure_window_specific_filters

Those are the settings I use to make the patterns more valid.
More...

When I worked on this extensively (~year 2000) i never managed to find anything with a significant statistical edge. I did find a few strategies that were net positive (after commissions etc), but not nearly as good as what I could design myself.

In the end I "retired" the approach.