Data Provider for Historical "Simulation" of Automated Trading with TickZoom?

greaterreturn · Jan 12, 2009

Does anyone know if the data providers like iqFeed, eSignal, or any others produce complete market data?

TickZoom is free and will be release midight January 18th.

TickZoom's specialty is collecting tick data with full quote, trade, and 5 level DOM data for every change of the 1st level DOM and/or every trade in a compressed binary format.

That makes it possible to do what is better called historical "simulation" rather than historical or back testing.

That is because strategies tested can run under real life simulated situation.

So do any data providers offer that kind of entire market data?

As the author of TickZOOM, I personally use MB Trading which does offer that kind of full, details info in high performance, high quality.

But many who want to use TickZOOM dislike that MB trading's API is only windows based and not Linux friendly.

So are there data provides with a Linux API? TickZOOM can run on Linux since it uses C# 2.0 that Mono supports.

FYI, it may seem relevant that TickZOOM is written in C# for your answer, but it can also call out to native or Java if necessary.

What's most relevant is that the data providers be Linux friendly and have that kind of total data feed.

Wayne

mikesmithv · Jan 13, 2009

Interactive Brokers is "Linux friendly" in that it depends on Java and the API does support .NET. I don't have direct experience with .NET since I use Java so others will have to comment on the quality of the .NET support. From what I can tell in the forums .NET is as well supported as the Java API which isn't saying much! Everyone likes to kick IB (as in that last statement) but overall I really like IB. The tick data is "aggregated" meaning you don't necessarily get every trade price/size pair because they may be combined with other trades at the same price depending on bandwidth. Some say it is actually worse than that but my experience is that IB is "good enough". At least the tick data never lags during high volume because the aggregation guarantees a fairly constant bandwidth requirement. Other low cost quote servers that do report every tick suffer from lag during high volume, but that too is what I have heard on these forums so take it with a grain of salt.

You can test IB's API using their demo account for free. The market data is not real but it is great for testing the API, even on weekends or the middle of the night when stocks are not even trading. I know it reports bid, ask & last price and I think 1st level DOM (not sure since I don't use it). If you just focus on supporting two quote servers for now then your code should be general enough for "n" servers (and you will be supporting the quote server I use, heh heh). I would hate to see you getting bogged down supporting every mom & pop quote server at the expense of seeing some kind of finished product that can be used for real trading.

dcraig · Jan 13, 2009

Have a look at OpenTick:

http://www.opentick.com

It may not be good enough for high frequency trading yet, but some day they will get it all together. The newish beta service http://beta.opentick.com is supposed to be a lot quicker for streaming data, though I haven't tried it yet.

There is quite a lot of historical data available including tick data abd supposedly book data. The last couple of years data is better than the earlier stuff.

The protocol is documented and open so you can use it in just about any environment. If you are using Java I recommend the alternative client library written by one of the users:

http://www.opentick.com/forum/viewtopic.php?t=2342

There is some silliness with dropped connections in the "official" Java library, and the alternative does it a lot better.

mikesmithv · Jan 13, 2009

Wayne can correct me if I'm wrong but I think TickZoom will only be supporting real-time data, never historical data. If understand his philosophy correctly you always record the realtime data downloaded during real trading and then "playback" that same data when you backtest. That way you are always backtesting using the exact data used during the original session. This allows you to replay a session over and over exactly as it happened. Not only does it simplify the design of the ATS but it improves accuracy of the simulations. It means Wayne must "wash" the data himself and create any kind of bar data required out of tick data (not rocket science) but the advantages are numerous. Just having access to the bid and ask at any moment is a big one. That also means the strategy must deal with data disconnects (which they should do anyway) since any temporary disconnects are preserved in all their glory in your downloaded data.

The JSystemTrader http://code.google.com/p/jsystemtrader/ is a good product (and free) for using historical data from OpenTick or Interactive Brokers, but I had to write my own system be able to play and replay market data the way TickZoom does. It was just too much work to twist JSystemTrader into that kind of system when it just was not designed that way. I just wish TickZoom was around two years ago, it would have saved me a lot of work!

greaterreturn · Jan 13, 2009

Quote from mikesmithv:

Wayne can correct me if I'm wrong but I think TickZoom will only be supporting real-time data, never historical data. If understand his philosophy correctly you always record the realtime data downloaded during real trading and then "playback" that same data when you backtest. That way you are always backtesting using the exact data used during the original session. This allows you to replay a session over and over exactly as it happened. Not only does it simplify the design of the ATS but it improves accuracy of the simulations. It means Wayne must "wash" the data himself and create any kind of bar data required out of tick data (not rocket science) but the advantages are numerous. Just having access to the bid and ask at any moment is a big one. That also means the strategy must deal with data disconnects (which they should do anyway) since any temporary disconnects are preserved in all their glory in your downloaded data.
More...

TickZoom works with historical data also. It's just that the tick objects only have whatever data is in the historical which is usually less than the real time. So my historical data only has bid prices so I create regular Tick objects but everything else is zeroed out except the ask and spread. I put those in at an average level.

The JSystemTrader http://code.google.com/p/jsystemtrader/ is a good product (and free) for using historical data from OpenTick or Interactive Brokers, but I had to write my own system be able to play and replay market data the way TickZoom does. It was just too much work to twist JSystemTrader into that kind of system when it just was not designed that way. I just wish TickZoom was around two years ago, it would have saved me a lot of work! [/B]
More...

You're absolutely right about the value of the recorded real time data vs. the bland historical stuff you can usually get.

Wayne

mikesmithv · Jan 13, 2009

Maybe I am being too nit picky but my simulator looks and the bid and the ask when simulating trades. If I am simulating a buy order then I wait for the first trade that occurred realtime where the last price matches the ask price and I "pretend" that was my order being filled (ignoring the quantity, not a problem at my level of trading!). So making everything the bid would make simulated trading a bit off. Maybe you handle that differently.

Generally a strategy that depends on bar data for an indicator can use the bar data "as is" from the quote server, but it should be saved at the start of that session when you used the data. This is especially important when a stock splits (or must be adjusted for dividends). When you download historical data anytime after that it may be different. The quote server companies constantly clean their data to remove reported spikes or other anomalies which is a good thing if you want "pure" data. It is not good thing for re-running a session against what is supposed to be the same data to see how your new & improved strategy tests out.

That means the day before the spit you have X days of historical data reflecting the pre-spit price plus the market data for that day. The day after the split you have the new downloaded X days of historical data (split adjusted) which you used for trading plus the market data for that day. You can re-run each day using the exact data that occurred.

Having said all that, that is NOT what my system does regarding splits. If I ever go back to swing trading stocks or use bar-based indicators that's what it will do however. I just have not implemented that yet. So this is a case of "do as I say" not "do as I do"

greaterreturn · Jan 18, 2009

Quote from mikesmithv:
Maybe I am being too nit picky but my simulator looks and the bid and the ask when simulating trades. If I am simulating a buy order then I wait for the first trade that occurred realtime where the last price matches the ask price and I "pretend" that was my order being filled (ignoring the quantity, not a problem at my level of trading!). So making everything the bid would make simulated trading a bit off. Maybe you handle that differently.
More...

Point is that TickZOOM works either way, depending on the quality of your data. Does your data have real bid, ask, and last trade? TickZOOM collects it this way but most people don't have simulation-quality data going very far back in time.

Generally a strategy that depends on bar data for an indicator can use the bar data "as is" from the quote server, but it should be saved at the start of that session when you used the data. This is especially important when a stock splits (or must be adjusted for dividends).
More...

I understand all these concepts but somehow I'm not following. Maybe an example would help.

When you download historical data anytime after that it may be different. The quote server companies constantly clean their data to remove reported spikes or other anomalies which is a good thing if you want "pure" data. It is not good thing for re-running a session against what is supposed to be the same data to see how your new & improved strategy tests out.
More...

Agreed! If that's your point, you're totally right because when running live 24/7 you have to operate on real time data which is very often "dirty".

That means the day before the spit you have X days of historical data reflecting the pre-spit price plus the market data for that day. The day after the split you have the new downloaded X days of historical data (split adjusted) which you used for trading plus the market data for that day. You can re-run each day using the exact data that occurred.
More...

Okay, your point is more clear now. The idea is that you want to keep history such that when you run a test on any day prior to the split, it uses pre-split data. But on days after the split it uses post-split data.

Or in other words, you don't want to retroactively adjust your historical data which would alter any tests prior to the splits?

Is that right?

Having said all that, that is NOT what my system does regarding splits. If I ever go back to swing trading stocks or use bar-based indicators that's what it will do however. I just have not implemented that yet. So this is a case of "do as I say" not "do as I do"
More...

*smile* Understood.

How to handle stock splits in TickZOOM hasn't been pondered until this thread.

It would be, I think, rather simple to add this to TZ for data it records.

But the tricky problem is always what historical data you can get and especially is it already back adjusted, etc.

It seems that historical data quality available for TickZOOM is rather weak compared to what it collects itself.

The seems to be an intractable problem.

I solve it by using a combination of data. In stocks that would be interested to explore.

Wayne

mikesmithv · Jan 19, 2009

I looked at my long rambling posts this weekend and thought "OMG I've turned into Jack Hershey" so I am VERY glad hear it made sense to you. I don't know why I felt compelled to focus on these sticky issues. Full speed ahead in the direction you are going, by all means!

WyckoffTrader · Jan 19, 2009

Wayne,

We have been recording level two data ( five deep0 on the major futures markets and may be able to get you the last 6 months to one year of data for the project you are working on.......what is the best format to plug into your software?

I will ask my programmer if this can be done in your required format.

Trade well.

jmmcox · Jan 19, 2009

Anyone have a list of formats TickZoom can support? Does it at least support ascii text files (OHLC)?