How do you guys store tick data?

Discussion in 'Data Sets and Feeds' started by mizhael, Jun 10, 2010.

  1. If you're storing your data in smaller than one-year segments, read as many in as it takes to get a year. If they're > 1 year, you've got some awfully big files there. :)
     
    #11     Jun 10, 2010
  2. GTG

    GTG

    I store tick events in binary files, separated out by date. Each security gets its' own directory...so for example all of my "SPY" tick data is in a directory called "STK_SPY". I load the files a few at a time on another thread, as a back-test progresses. 99% of the time, I am interested in looking at the data as a sequence of days...i.e. what happened between x and y dates, so this is very easy to implement these types of queries using this format.
     
    #12     Jun 10, 2010
  3. januson

    januson

    Couldn't agree more.
    Though if speed wasn't the issue I would prefer a db like Sql Server 2008 :)
     
    #13     Jun 10, 2010
  4. const451

    const451

    CSV and binary files look to me like they require some extra programming efforts to achieve the flexibility of a database. Why not to use SqlServer or MySql, to store data for backtesting purposes. Backtesting does not require real-time performance so one of those relational databases would probably suffice. I've never used KDB but it's used for storing data in real time and that functionality is not needed for backtesting.
     
    #14     Jun 10, 2010
  5. promagma

    promagma

    HDF5 and file system both give you the ability to group your data (grouped by date and subgrouped by ticker, like others have said) and store it pre-ordered by time. So if you only have one type of query, you can structure your data for that and completely avoid table scans or needing to sort anything. It is basically a straight shot from the disk.

    I have no experience with KDB but I would guess it is more flexible if you need to query data in many different ways.

    Any of this beats relational databases, which have no concept of pre-ordered data, so you can't even do a quick binary search lookup without a hefty index file.
     
    #15     Jun 10, 2010
  6. Of course speed is a huge issue.

    The goals are:

    1. Store gigantic amount of data
    2. Fast query
     
    #16     Jun 10, 2010
  7. But backtest still needs to be fast because you do "optimizaton" during backtest...
     
    #17     Jun 10, 2010
  8. Bob111

    Bob111

    they load differently. binary file -you just open it and load into memory. sql-to load you have to go thru each record. the difference in time and performance is huge. i remember in old times calculations with binary that take an hour can take like 5-6 hours when i was using ms access stored tick data
     
    #18     Jun 10, 2010
  9. I generally backtest in R. I can load my data once, then backtest in a variety of different ways (or using different backtest variations, as your comment suggests) without having to read the data again. If I change my code, I just re-source it, but the structure containing my data stays present.
     
    #19     Jun 10, 2010
  10. Lets take 6 months intraday tick data (only last trade, not bid and offer, and not market depth data),

    and lets take QQQQ,

    you can store the whole data in memory in R?
     
    #20     Jun 10, 2010