Log in or Sign up

ET News & Sponsor Info

General Topics

Markets

Technical Topics

Brokerage Firms

Company Specific

Interactive Brokers

Tools of the Trade

Trading for a Living

Community Lounge

Site Support

Feedback

Backtesting time period

Discussion in 'Strategy Building' started by NachiketJoshi, Jan 6, 2014.

ronblack
- 754
  Posts
- 18
  Likes
Also the number of trades matters. For 5-minute data you should have more than 3000 trades at minimum

#11 Oct 25, 2014

Share
Murray Ruggiero Sponsor
- 1,997
  Posts
- 30
  Likes
PIFFIPTradingStrategies said:
2 months of data seems very very less. I had used 1 minute historical data on 10 years of data to back test my strategy so as to ensure it would have passed through all possible scenarios and conditions (Bullish / Bearish / Cyclic / Noisy...).
More...

Yes, it's really important that you have at least one bear and one bull market in your data set. I would say 10 years is required. I use 10-15 years of data even on my intra-day strategies.

Contributing Editor Futures Magazine and Vice President of Research and Development for TradersStudio and UsingEasyLanguage.com. Consultant for Neblio (NEBL) Enterprise Block Chain Platform.

#12 Oct 28, 2014

Share
NachiketJoshi
- 23
  Posts
- 1
  Likes
ronblack said:
Also the number of trades matters. For 5-minute data you should have more than 3000 trades at minimum
More...

Can you explain how did you arrive at the number 3000?

#13 Oct 28, 2014

Share
abattia
- 1,169
  Posts
- 24
  Likes
One of the biggest threats to successful optimization/backtesting is the possibility that the results are just a curve-fit (or "over-fit", if you use that instead …)

i.e. they are just the strategy settings that would have worked best by chance over the historical data used, but contain no “intelligence” about price action or market structure that will enable the strategy to perform well if future price action is different … which of course it will be…

... like a lottery winner wins by chance but has no edge in terms of winning again in the future…

One way to have a shot at beating the curve fitting genie is to make use of “In Sample” and “Out of Sample” data. Optimization is undertaken using the “In Sample” data, and then the optimized settings are used to simulate performance in the “Out of Sample” data. Then statistics is used to compare the two populations (i.e. Population 1 - “In Sample” trades and their associated performance statistics, and Population 2 - “Out of Sample” trades and their statistics). With statistics, significence levels are established to test the null hypotheses that - say - the “Out of Sample” average trade, “Out of Sample” PF, etc. is the same as the “In Sample” average trade, etc.

That is, confidence levels are established to test the hypotheses that Out of Sample performance is the same as In Sample performance. If it is, then this strengthens the possibility that the settings identified “In Sample” reflected [i.e. “intelligently”] some aspect of market structure or behavioural finance that allowed the strategy to perform the same way “Out of Sample”, too. If not, then the probability increases that the “In Sample” findings were no more than a meaningless curve-fit. In which case we abandon them and move on to search for alternative settings.

So the question of “how many trades are needed for a backtest?” can also be answered by approaching it from the above angle: … how many trades are required to produce two big enough trade populations (“In Sample” and “Out of Sample”) that statistics can be used to attack the curve fitting genie at whatever significance level (e.g. 90%, 99%, 99.9%, etc.) is required.

Last edited: Oct 28, 2014

#14 Oct 28, 2014

Share

SimpleMeLike and NachiketJoshi like this.
Sergio77
- 798
  Posts
- 90
  Likes
abattia said:
One way to have a shot at beating the curve fitting genie is to make use of “In Sample” and “Out of Sample” data. Optimization is undertaken using the “In Sample” data, and then the optimized settings are used to simulate performance in the “Out of Sample” data. Then statistics is used to compare the two populations (i.e. Population 1 - “In Sample” trades and their associated performance statistics, and Population 2 - “Out of Sample” trades and their statistics). With statistics, significence levels are established to test the null hypotheses that - say - the “Out of Sample” average trade, “Out of Sample” PF, etc. is the same as the “In Sample” average trade, etc.
More...

Am I right that there is some confusion here regarding the use of the terms sample and population? I think a sample is what you get from backtests. Population is what defines the actual distribution of your system returns. I think someone else before also mixed up those.

http://www.dissertation-statistics.com/population-sample.html

abattia said:
so the question of “how many trades are needed for a backtest?” can also be answered by approaching it from the above angle: … how many trades are required to produce two big enough trade populations (“In Sample” and “Out of Sample”) that statistics can be used to attack the curve fitting genie at whatever significance level (e.g. 90%, 99%, 99.9%, etc.) is required.
More...

Again, I think the way to put it is : "how many trades are required to produce two big enough trade samples"

The answer IMO is many thousands because the actual population of trades is very large.

I just saw this but I am not sure I fully understand it. Can this bound be used to determine the number of trades based on win rate?

Last edited: Nov 1, 2014

#15 Nov 1, 2014

Share
abattia
- 1,169
  Posts
- 24
  Likes
Sergio77 said:
... Again, I think the way to put it is : "how many trades are required to produce two big enough trade samples" ...
More...

Yes, you're right I have mixed up sample and population. Thanks!

#16 Nov 1, 2014

Share
abattia
- 1,169
  Posts
- 24
  Likes
Sergio77 said:
... I just saw this ...
More...

Thanks!

Sergio77 said:
...Can this bound be used to determine the number of trades based on win rate?
More...

It probably could if you equate p to win rate ...

Referring to wikipedia entry for Chernoff bound ... "the Chernoff bound requires that the variates be independent". So you have to assume that trades are independent of each other, and ignore serial correlation. So it is just a rough approximation to what is really happening, but may still be useful ...

Last edited: Nov 1, 2014

#17 Nov 1, 2014

Share
abattia
- 1,169
  Posts
- 24
  Likes
Sergio77 said:
I just saw this but I am not sure I fully understand it. Can this bound be used to determine the number of trades based on win rate?
More...

Thinking further about this, I think it effectively gives a way to estimate how many trades you need in each of your In Sample and Out Of Sample "samples" to be "(100 - epsilon) % certain" that say the two average trade measurements can be compared. ...

#18 Nov 4, 2014

Share
dtrader98
- 1,927
  Posts
- 69
  Likes
Some of you are focusing way too much on the statistics, without really seeing the forest from the trees.
If you have only a small subset of data, there is no reason to believe the same conditions will necessarily persist on a new small subset of data. Market properties change a lot, and therefore a lot of market conditions should be required to give you more confidence, not a lot of trades in a small subset of market conditions.

#19 Nov 4, 2014

Share

SimpleMeLike likes this.
d08
- 7,721
  Posts
- 5,450
  Likes
The amount of parameters is also crucial. A strategy with 10+ parameters and 2000 trades is more curve fit than a strategy with 3 parameters and 200 trades. The less broad the strategy the better it tends to be going forward.

#20 Nov 5, 2014

Share

donedge likes this.

(You must log in or sign up to reply here.)

Search