How do you guys store tick data?

Discussion in 'Data Sets and Feeds' started by mizhael, Jun 10, 2010.

  1. ET151

    ET151

    I have not tested it out yet, but Tokyo Cabinet claims to store 1 million records in under a second (hash table mode) or 1.6 seconds for B-Tree mode. Are you sure PostGRESQL can keep up with that even without consistency checks? Million record stores in 1.6 seconds is pretty damn fast.

    http://www.igvita.com/2009/02/13/tokyo-cabinet-beyond-key-value-store/

    It's doesn't look too hard to setup so I hope to test it out soon.

    Update: I checked, even SQLLite claims to be much faster than PostgreSQL. (http://www.sqlite.org)/speed.html)
     
    #61     Jul 10, 2010
  2. nbates

    nbates

    solid state drives & flat files...a database in the path of execution or anywhere near market data is a pox on quality
     
    #62     Jul 10, 2010
  3. ET151

    ET151

    nbates, I've asked this question in another thread...if you go with flat files, are you able to access data in that file while the file is being written? Benefit of database is that you can read and write to it concurrently (single writer, multiple reader).
     
    #63     Jul 10, 2010
  4. ET151

    ET151

    Umm...just found something better than Tokyo Cabinet...

    http://1978th.net/kyotocabinet/

    (http://1978th.net/tech-en/promenade.cgi?id=7)
     
    #64     Jul 10, 2010
  5. nbates

    nbates

    It really does not matter whether or not you have asked the question in another thread, I answered it in this one and a database is "nothing more" than a flat file for those who do not know and/or understand how do use threads and critical sections to implement concurrent store and fetch trade history and time-series storage systems.

    Database is one-size fits ALL, don't try running the 100 yard dash expecting to win in those shoes at the Olympics, lol
     
    #65     Jul 10, 2010
  6. ET151

    ET151

    Yes, but as I asked previously, are you actually able to read all the information that you have written to your flat file while it is open and you are writing to it without having cached all the contents of the file in memory? Question here is whether it is worth developing such code or simply use a very powerful database solution that I can have running in less than a day. I can always write to the database during the week and then dump to a file when market closes. Not claiming that this is the ultimate, final approach, but I like leveraging other people's work as much as possible.
     
    #66     Jul 10, 2010
  7. nbates

    nbates

    Good question!

    The approach I've found best is to cache in memory a certain amount of data and periodically based on time or the amount push chunks out to disk appending to a file. When requests come from the client application and more than what's currently in cache is required, read the file and append what's current in the memory cache, if there is any, to the stored data that was fetched.

    I use a critical section around functions like "add_point" and "get_series" which are each driven by a different thread and 99.9% of the time there's never contention and when there is it's undetectable.

    The thing is, with a database you can only do some number of transactions per second (pick a number) and if you are storing each bar on 40,000 stocks at whatever interval [I do it on a one-second interval] then you are limited by the database.

    Instead, store bars in memory...compress them using something like a "rep_count" which means if the last bar equals the next bar to store, then set "rep_count=2" for example and you can do a hell of a lot more that fits with what you're trying to accomplish, if performance is the goal or an objective!
     
    #67     Jul 10, 2010

  8. One config change in PostgreSQL makes the speed similar to CouchDB, Tokyo Tyrant, Redis, MongoDB, Cassandra and Project Voldemort, and PostgreSQL can be tuned to make it even faster:
    http://www.pgcon.org/2010/schedule/attachments/141_PostgreSQL-and-NoSQL.pdf
     
    #68     Jul 10, 2010
  9. ET151

    ET151

    Yes, for that application, I would be working with flat files as you are doing. For now, I found a very simple solution similar to what's described in this thread:

    http://www.daniweb.com/forums/thread272775.html

    I plan to stick with CSV files for now...eventually I will convert them all to binary, but that's lower on my list of things to do. I can do what WinstonTJ suggested in that other thread I mentioned earlier (if you want to see it, it's one of the 4-5 posts that I have made on here). Basically, I have one computer logging my data. I open up the data log directory to my local network as read-only. Then I simply read the log files on the client machine as they are being written. Two ways to do this:

    1) On the client computer, open up log file and read until readLine() == null. Then pause and check read-line again in 500 ms, 1 sec, etc (again, this is not an automated trading app...). Only trick is to either ensure only complete lines are written to file...OR if a line does not contain the newline character, backup 1 line (not sure how to do that in Java besides marking every line as I am reading the file).

    2) Once all but the last few lines have been written, stop, establish a socket connection with the server computer and subscribe to the instrument of interest. Server computer then forwards all market information over the socket connection. Buffer that data and then advance the file reader until the line read from the file matches the first line of buffered data fed over the socket connection. Then close the file reader and only use data fed over the socket connection.
     
    #69     Jul 10, 2010