Storing tick data with Python

Discussion in 'App Development' started by bantam, Dec 2, 2015.

  1. bantam

    bantam

  2. Gambit and benwm like this.
  3. nitro

    nitro

    My current opinion is that from the persistence point of view, collecting data should use one strategy, and analyzing it datastore should sit on top of the system that collects it. In other words, I think that the interface to read data for analysis of data should be done in the most user friendly manner possible, with the collecting it in realtime or reading it from e.g., a TAQ file, should be done using different "database" systems. I don't care that much how fast something is offline.

    I have come to the conclusion that this division hits the right balance, at least for me. Extreme speed and compression databases are too hard to use as a researcher given existing open source tools. While the friendly stuff can't keep up in realtime environments.

    I don't want to use Kdb to do research. I want to use python. On the other hand, I probably don't want to use python to store massive amounts of realtime data.
     
    Last edited: Dec 3, 2015
  4. Butterfly

    Butterfly

  5. bantam

    bantam

    GAT, Arctic does look very cool. It's column-oriented and compressed. I wonder at what granularity they store the data. I also wonder if they support queries with arbitrary start/end times, and if so, how they do it. If I can figure out how to access the data with Matlab, then I'll install it and give it a try.

    nitro, I agree. Collecting realtime ticks for many/all instruments is daunting. Even just maintaining 100% uptime seems hard. My next step will be to buy daily TAQ updates that I can process overnight, to keep my backtests up-to-date. Then I can just discard whatever ticks I process throughout the day.

    Butterfly, perhaps the title is misleading. I'm not talking about collecting realtime ticks. I'm talking about creating a disk-based data store that makes it fast to read in all trades for a ticker on a given day. For that purpose, I don't believe there is a faster method. The data can be read using C/C++, but getting it into Python is also fast, since zlib is a compiled library. Of course, applying backtest logic to the ticks once they're in Python will not be fast, but that isn't the point.

    Has anyone here tried HDF5? I tried once a few years ago, and it was very slow. I must have done something wrong, though, because others say it is fast.
     
  6. Gambit

    Gambit

    Thanks. Did you work on this at Man?
     
  7. I certainly wasn't responsible for writing it, as I am not by any means a professional programmer! But I probably did do some beta testing of a very early version (it wasnt called arctic then, so I can't be sure).

    GAT
     
  8. Gambit

    Gambit

    I think nitro covered this already but doesn't data have to be cleaned even if it is collected raw?
     
  9. nitro

    nitro

  10. 2rosy

    2rosy

    #10     Dec 4, 2015