historical tick data

Discussion in 'Data Sets and Feeds' started by glok_twen, Apr 1, 2008.

  1. glok_twen


    in the market for back data for testing. want it to be clean. do not need nor want to pay for charting/analysis software. just clean historical tick, along with software to build continuous contracts.

    looks like tick data and real tick are the main game. did i miss any? how would you compare/contrast them at the summary level?

  2. One


    Cleaning tick data is more of an art than a science, and I've run into problems with every provider I've used. Mostly I've purchased tick data from Tick Data and generally have faith in their cleaning algorithms, but not without frustrations.

    CQG also offers tick data, which I've compared tick by tick with Tick Data, and I thought Tick Data was better aligned with my specific purposes at the time. On the other hand, for some of the strategies I am looking at now, I think CQG's decisions on which ticks to include and which to discard may be a better match.

    Understanding the algorithms to clean data used by each provider, is the only way I know to decide which provider is best for your purposes.
  3. Asada


    I'm considering buying some futures data from Tick Data. Can you explain some of the frustrations? I'm guessing from your post that them erroneously cleaning some unusual, but very real trades is one of them. Also, I heard that CQG is only timestamped down to the minute vs. to the second with Tick Data. Does one just trust that CQG has the trades in order? Thanks very much.
  4. One



    The source of the frustration is that a single data supplier/set will not meet the needs of every strategy, not in my experience that the supplier is making an error in how they apply the cleaning algorithm.

    For example, it is not unusual for Tick Data to have a different high or low than most other providers for a particular day, let's say because their algorithm flagged a reported trade at a new high as not meeting their critieria for a legitimate trade. If your system is based on the prices where participants actually took positions, then Tick Data's interpretation works well. On the other hand if you are interested in price levels that other market participants are watching, than the high you are watching will not be the same high everyone else in the market is watching.

    Tick Data used to have a white paper available on their website describing in general terms their cleaning algorithms. You may be right about CQG's time stamp - I seem to remember that being the case. Contact them and they will send you a sample file of tick data.

    Some provider provide a flag for each tick, indicating why it was or was not included in the finished data file. I don't remember seeing this functionality, but if you could pick and choose which algorithms to run to construct a finished series from all the raw data it would be ideal.

    Good luck!
  5. Asada


    Ah, thanks very much for your help.
  6. bluelou


    Keep in mind that the filters used by Tick Data will do more than "clean" the data series of bad ticks. This probably isn't much of an issue if you're using time bars but it's a huge deal if you're using tick bars.

    I spent about $1000 with Tick Data and found the data to be so clean that it was useless. From what I recall, what would have been approx. 500MB of tick data unclean became <300 MB cleaned. That's not cleaning the data of bad ticks, that's more like taking steel wool to fine china, getting rid of the engraved details, and saying you washed the dishes.

    In other words, if you run simulations on Tick Data's tick data you should be prepared to pay for their real-time service, too.