How do you guys store tick data?

Discussion in 'Data Sets and Feeds' started by mizhael, Jun 10, 2010.

  1. A real HFT system if well thought out has the data change the queries in real time.... you can't do that in a relational DB. You need KDB or some event based processor.
     
    #51     Jun 19, 2010

  2. Why KDB? For "scalable" and "column oriented" there are other options out there. How many terabytes of data do you have? How fast do you need to process queries? Is this for HFT? Are you serving data to a team of 100 employees or is it just you? What types of queries does your data need to be optimized for? Are you using RAID or SAN or grid/cloud?

    I do web programming and the standards for storage are MySQL, PostgreSQL, Oracle, and SQL Server. Things might move to the cloud soon (for example Amazon) but I can only see that being useful if you have *tons* of data. I see MatLab offers parallel computing you might look into that.
     
    #52     Jun 19, 2010

  3. Databases take care of this thinking and optimization for you. They are faster than a developer who doesn't realize when b-trees (and other data structures) are faster than arrays. On Linux you can also optimize the file system. My point is, a database might be binary files, but one binary file system can outperform another binary file system on the same hardware. To give a concrete example, a tuned install of MySQL on a Linux box will be much faster than a typical homebrew Windows solution.
     
    #53     Jun 19, 2010

  4. If this is indeed for HFT, then yeah. I've heard it called event stream processing (ESP) or complex event processing (CEP). Oracle does have something called Oracle CEP.
     
    #54     Jun 20, 2010
  5. januson

    januson

    hi StoxTrader, thank you for commenting.
    You can of course not know this, but I have a background in Sql Server which goes back to 1998.

    My point in my postings are that if one knows what and how to program, then db's are outperformed in magnitudes.

    MySql are indeed a fast db and will in some cases be faster than MsSql or Oracle, but the decision is rarely based at only one thing. Other features, such as fail over, high availability, mirroring, tracing and optimizing are also considered when one should choose a DB.

    But storing ticks is very very simple due to the format, consistency and streaming nature of data.
     
    #55     Jun 20, 2010
  6. nbates

    nbates

    if you're using Day bars a database is fine, anything else flat files or pure "in-memory" storage would be my suggestion. thought & opinion
     
    #56     Jun 21, 2010
  7. HFT is over used and misunderstood. A true HFT system does not have time to pull data, process and adjust accordingly during a trade or even intra-day (without shutting down & restarting the system). There simply isn't enough time intra-trade to process data to adjust your positions prior to exit.

    Sending a lot of orders =/= HFT, needing fast execution =/= HFT.
     
    #57     Jun 22, 2010
  8. ET151

    ET151

    I just found this - Tokyo Cabinet:

    http://www.youtube.com/watch?v=2k1J7Vn4EDg

    -Up to 8 EB of data storage
    -Concurrent
    -Various ways of storing data: hash table, B-Trees, fixed length arrays
    -Very, very fast
    -Free and Open-Source
     
    #58     Jul 10, 2010

  9. Make sure to test any of these new NoSQL databases like Tokyo Cabinet before using them on an actual system that trades real money. For example there are reports of MongoDB dropping/deleting/corrupting data. For example if you disable any logging, consistency checks, etc from PostgreSQL it's as fast as any NoSQL database... at the expense of no logging and consistency checks. "Yay it's blazing fast!" "Hey uh where did my last 6 months of data go?" If you need blazing fast I would recommend using an in-memory database and/or solid state drives.

    I have not heard anything negative about Tokyo Cabinet, I'm just sayin'.
     
    #59     Jul 10, 2010
  10. Before you get all hot & bothered about the stock Backblaze talk to a really good IT/network guy/gal. The unit, in stock form has limitations with both the RAID cards and the backplanes mostly and then secondly with the choice of a 775 socket (e8400).


    I swapped in different backplanes, RAID cards and motherboard (i7) so that the unit could handle the I/O that I need. This thing is designed for archive cloud storage - think Mosey or Iron Mountain, not even at the photobucket/snapfish level. If you want to be pulling data on the fly (mid-trade or during execution) and do computations based on that data then this isn't the solution for you. I still think people are crazy thinking they need to record realtime tick data + look at accurate historicals all for an on-the-fly algo. Unless you trade very few symbols the sheer size of the data gets overwhelming quickly.
     
    #60     Jul 10, 2010