how to do judge the degree of data-mining/over-fitting in a strategy?

mizhael · Jan 19, 2011

Quote from nLepwa:

Sure.

You consider your pf as a random variable. At the end of every trading period you get a realization.
You then analyze the characteristics of the distribution of that random variable.
If the distribution is stationary your strategy is robust.

Ninna
More...

Okay, at the EOD of every day, I get a realized PNL which is a realization of a random variable.

Collecting 10yrs of such numbers I get 2500 realizations of one distribution.

And I can plot the empirical distribution for that random variable.

But how does "stationarity" come into play here?

And very possibly a highly curve-fitted strategy will have a good-looking empirical distribution?

mizhael · Jan 19, 2011

Quote from nLepwa:

Fix or dynamic weights?

For fix weights the risk of over-fitting is huge and I wouldn't trade it.

For dynamic weights you can test the algorithm on different asset classes and come up with something quite robust.

Actually, some algorithm even have theoretical guarantees. You can reach the performance of the best fixed weight portfolio in hindsight up to a constant.

Ninna
More...

NASDAQ has dynamic weights.

And you take those weights, and then do some math,

and you get your new weights,

your new weights are "dynamic" because of they inhere the "dynamics" of the NASDAQ...

intradaybill · Jan 20, 2011

Quote from nLepwa:

Sure.

You consider your pf as a random variable. At the end of every trading period you get a realization.
You then analyze the characteristics of the distribution of that random variable.
If the distribution is stationary your strategy is robust.

Ninna
More...

I don't see the connection of stationarity and robustness. Robustenss usually means that key performance criteria remain above a certain threshold. Not that they remain at the same level.

intradaybill · Jan 20, 2011

Quote from mizhael:

Hi all,

Suppose somebody comes to you with a strategy and with backtest results. The Sharpe ratio is good and the cumulative PNL curve looks great!

More...

They do come almost on a daily basis to the place I consult. As you say, results look great on paper.

Quote from mizhael:

Hi all,

But you are concerned that there might be too much data-mining/overfitting...
More...

You must be. True that 99.99999% of those paper results are optimized or involve selection bias.

Quote from mizhael:

How do you evaluate the degree of data-mining/overfitting?
More...

This is the wrong question to ask IMO. There is nothing evil with data-mining and curve-fitting. There is no, absolutely none theoretical proof that over-fitted systems fail solely by virtue of optimization or data-mining.

Besides the obvious forward testing solution to your problem that takes time, there are other quick ways of analyzing the data to see if these systems will fail in the future.

I give you an example of what we do: we do over-optimize systems on purpose so that they will produce very good returns for about six months. We actually look for over-optimized systems. After a peak in equity we drop them and go for the next one. You are on the wrong track because you read parrot authors and you listen to parrot speakers.

mizhael · Jan 20, 2011

Quote from intradaybill:
This is the wrong question to ask IMO. There is nothing evil with data-mining and curve-fitting. There is no, absolutely none theoretical proof that over-fitted systems fail solely by virtue of optimization or data-mining.
More...

I agree. I do believe RenTec hire those PhD Quants to do data-mining, curve-fitting and optimization.

So there must be a place for these techniques.

It's about how one uses these judiciously and control for the downside. They must have a good framework or process control.

Quote from intradaybill:
Besides the obvious forward testing solution to your problem that takes time, there are other quick ways of analyzing the data to see if these systems will fail in the future.
More...

Forward testing does take too long. And it may not work - if a strategy fails after 3 months of initial running, would you claim it doesn't work at all or it's just unlucky...?

What are the other quick ways of analyzing the data to see if these systems will fail in the future?

Quote from intradaybill:
I give you an example of what we do: we do over-optimize systems on purpose so that they will produce very good returns for about six months. We actually look for over-optimized systems. After a peak in equity we drop them and go for the next one. You are on the wrong track because you read parrot authors and you listen to parrot speakers.
More...

You are right - that's why I began to question about the "controlled" use of optimized/over-fitted systems.

So how do you decide the equity curve is peaked?

If you have a way to decide the equity curve is peaked, then you also have a way to decide if the SPX500 is peaked, and then you get the holly grail!

Specterx · Jan 21, 2011

Quote from mizhael:

Forward testing does take too long. And it may not work - if a strategy fails after 3 months of initial running, would you claim it doesn't work at all or it's just unlucky...?

What are the other quick ways of analyzing the data to see if these systems will fail in the future?
More...

There's no foolproof way to know if something will work in advance. All you can do is employ robust development and testing practices:

- Develop a system on a portion of the dataset, then test good performers on the remaining portion. If the performance differs significantly between the two sets the system is likely curve-fitted.

- Forward test, 3 months sounds like a good minimum but it depends on the parameters of the strategy (holding period etc.). The thing to look for is not whether the system performs well or badly but how it performs relative to the expectations. If the forward testing shows a normal, expected drawdown then it might be OK, but if your equity goes from a 45-degree upwards curve in the backtest to a 45-degree downwards one in the forward test, you're likely curve-fitted.

- Check the system for magic numbers or optimized parameters (choices of MA or RSI values are the classic basic example). The more of these there are, the less robust the system tends to be. If variables/spreads/correlations etc. are used, work through logically what drives the evolution of these factors and understand what would cause the system to break down (see LTCM).

- Diversify as much as possible with production systems. If one stops working at some point you've still got five or six others that are carrying you along.

- Lastly, it can never hurt to understand exactly what a system is doing and why it's working or not. For example, an S&P trend-following system should have performed extremely well over the past 4-5 months, probably less well last summer. If you had a system for each condition (and some knowledge of trading and market movements) you could choose when to deploy a system, and when to switch it off based on the prospects for a favorable market.

Hugin · Jan 21, 2011

Quote from intradaybill:

This is the wrong question to ask IMO. There is nothing evil with data-mining and curve-fitting. There is no, absolutely none theoretical proof that over-fitted systems fail solely by virtue of optimization or data-mining.

Besides the obvious forward testing solution to your problem that takes time, there are other quick ways of analyzing the data to see if these systems will fail in the future.

I give you an example of what we do: we do over-optimize systems on purpose so that they will produce very good returns for about six months. We actually look for over-optimized systems. After a peak in equity we drop them and go for the next one. You are on the wrong track because you read parrot authors and you listen to parrot speakers.
More...

One obvious reason why this question is asked is simply that it is trivial to create an optimized trading system with spectacular back-test results (or rather results based on training data). The brutal truth is that almost ANY model combined with a powerful optimizer will show good back-test (training) results. But you really, really hope that results will be at least nearly as good walk-forward. But most of the time they are not and when they are it is possibly just by chance.

The problem is that it is very hard to create models for trading systems, training set-up and system evaluation that can utilize data-mining/fitting and get good walk-forward results. So people blame data-mining/fitting.

Regarding the OP:s question I have seen people use variance reduction to determine when to stop optimization.

Another possibility could be to look at Statistical Learning Theory (Vapnik-Chervonenkis) which has a strong connection to kernel theory. But this is very theoretical.

Hugin

abattia · Jan 21, 2011

Thanks, OP, for starting an interesting thread.

IMHO, there is still a wide gulf between good strategy design based on sound ideas, and a successful real life implementation of that strategy. I am referring not to over-fitting or data-mining, but to unrealistic fill assumptions made by the strategy designer. Unless strategy design incorporates realistic order book simulation, youâll need to work hard to understand how real-life results relate to the backtests, and to bring them better into line. This may be less important for swing trading (I assume), but becomes increasingly key moving to shorter and shorter timeframes.

In the case of the strategy you are reviewing (and that prompted your post), if you believe realistic order book simulation was not undertaken during its design, one test you can do is estimate what the strategy stats would look like if the % winners was much lower than claimed (e.g. that the strategy got all the losers, but only a third, say, of the winners).

Would the strategy still look good? If it does, you may be in good shape.

If it doesnât, you will have to work hard to achieve what the strategy vendor/designer claims is possible and, in fact, it may not be achievable...

Good luck!

intradaybill · Jan 21, 2011

Quote from Hugin:

The problem is that it is very hard to create models for trading systems, training set-up and system evaluation that can utilize data-mining/fitting and get good walk-forward results. So people blame data-mining/fitting.
More...

I agree with this. It is hard to develop winning trading systems using any method. The failure rate is so high that people, through induction, blame the methods.

There is another part of the story though. I use a data mining program and it has made me good money. I met someone in another forum and he happened to use the same exact program. He complained he was not making any money. After a while I realized that he hadn't even read the manual. That was amazing. He was running the program in the wrong mode.

Now think about it. All those losers who never do their homework and are many more in proportion to those that do it (remember high school?) sort of enforce their opinion in the days of the internet. This is a great threat to society as a whole. I mean the fact that the 80% that produces the 20% of the wealth tries to convince everyone that something is wrong with the way things are done.

DeeDeeTwo · Jan 21, 2011

Quote from psytrade:

Note that some people only create strategies that work on multiple instruments to avoid the historical bias inherent in single strategies....while these strategies are more volatile and may have larger drawdowns, they tend to be more robust over the long haul.
More...

The exact opposite is true.

Any Competitive Advantage that you might have...
Will be the result of accumulating EXPERTISE...
Almost always specific to a specialized market niche...
And then leveraging that professional level EXPERTISE...
Into an expertly executed trading strategy.

Experts tend to be HIGHLY specialized...
Being a generalist is pretty much worthless...
Knowing 1,000 general market facts off CNBC is worthless...
But devoting 10,000 hours (5 years) to something very specific...
Will make you an EXPERT.

The endless stream of people here...
Dreaming up mechanical strategies using high school math...
Are just dead ending themselves.