Radically different results based on datasource?!

BrooksRimes · Jun 6, 2005

I have a 5 min system on the Dow mini that I backtested using data from myTrack/Trackdata and got good results.

I didn't have enough data so I obtained more data from another source.

To check the data, I ran the system for the same date range. The difference was huge. One test had 22 trades, the other had 50!

So I bought the 5 minute quotes into Excel and compared. There were minor differences in the OHLC numbers. When I summed for the data there was only a 1 tic difference on the O & C.

Are these type of variations normal? What accounts for them?

Does this lead developers to create systems on shorter bars to remove these variations?

Makes you wonder which system to trade.

Thanks.

mmillar · Jun 6, 2005

It's telling you your system is fitted to the data. Don't trade it at all!

BrooksRimes · Jun 6, 2005

With my original data, I optimized only on the middle 1/3 of the data. It shouldn't be fitted.

mmillar: Do you ever run the same backtest using different data sources? If so, do they come out close?

Brooks

Quote from mmillar:

It's telling you your system is fitted to the data. Don't trade it at all!
More...

BrooksRimes · Jun 6, 2005

I did another comparison with 1 minute data. Things seem to vary a lot minute by minute but even out at the even of the day.

For instance, vendor 1 showed a big volume spike at 10:51 that the 2nd vendor reported at 10:52.

Because the entry is based on candlestick patterns, this will make the difference on whether the pattern even forms or not.

It seems clear that you must go live with the same data source that you backtest with. Any thoughts or comments on this?

Brooks

j1900q · Jun 6, 2005

I use mytrack data each day. I have found it to run right with My IB data.
Keith

maxpi · Jun 6, 2005

mytrack data seems to be good. Tradestation data seems good to me so far but one little thing, they time stamp a bar with the ending time of the bar, not the beginning time. Maybe one of your software packages does that?

Data is more than just a little problemattical. Compare the EOD data versus some intraday data from the same vendor, in a lot of cases you will see discrepancies in the daily OHLC based on which data you use. All vedors probably send bad ticks in realtime. Some include filtering in their software to flag bad ticks and they fix the bad ticks when the exchange sends out a correction but that won't stop a bad tick from affecting your strategy Some vendors send only raw data without corrections at all.

I prefer to put a bad tick filter right in my strategy, if something is too far out of line I won't act on it unless a confirming tick follows it. If I ever go full auto I will probably put data in an array in the strategy as it comes in and clean it based on non-confirmation. If you use indicators like stochastics that rely on highs/lows then you can get really wild readings with a bad tick.

Max

AAAintheBeltway · Jun 6, 2005

Certainly you can get major differences in derivative data such as TICK and TRIN. If your system uses them, I think it's very important to test using the same data you will be using to trade.

The old TrackData system had a proprietary intraday indicator for the S&P that worked like magic. Do they still provide it?

mmillar · Jun 6, 2005

Quote from BrooksRimes:

With my original data, I optimized only on the middle 1/3 of the data. It shouldn't be fitted.

mmillar: Do you ever run the same backtest using different data sources? If so, do they come out close?

Brooks
More...

I have in the past but I don't do it on a regular basis. No two sets of data will ever produce the same results - but whether they both produce generally profitable results depends on how 'good' the system is. You shouldn't get hung up on differences between different data sources because a) they're all different and b) data in the future will be different anyway. However, if one set of data produces profits and the other losses you need to be very cautious.

There's an old post around comparing a system that made loads of money running on eSignal but lost loads using TradeStation!

BrooksRimes · Jun 6, 2005

That's interesting! So what were the conclusions from that?

Quote from mmillar:

There's an old post around comparing a system that made loads of money running on eSignal but lost loads using TradeStation!
More...

BrooksRimes · Jun 6, 2005

Solved this problem. Turns out that one datasource uses the time of the opening of the bar and the other uses the time of the closing of the bar.

I'm still surprised at how this "slight" difference makes such a big difference in a system based on candlestick patterns. 22 trades with the time one way and 50 trades with the time the other way.

Brooks