Tick Database Implementations

Discussion in 'Data Sets and Feeds' started by fatrat, Nov 25, 2006.

  1. a5519

    a5519

    #11     Nov 27, 2006
  2. The best thing is to NOT use an RMDBS when working with time series data. Instead, use a linear database with memory mapping. To locate a record you can triangulate based on the date/time. For example, see RMD Server: <a href="http://www.modulusfe.com/rmdserver/">www.modulusfe.com/rmdserver</a>. It can compress ticks into any bar format and broadcast data to clients in real time. You can set permissions for each user, set entitlements, etc. RMD Server is more for ticker plants like eSignal, DTN, etc. But if you just need something simple for a Desktop solution, I recommend that you write a large linear memory mapped file and cache the file offset using __int64 each time you read/write. Get a fast, cheap sata drive such as the WD Raptor 10k. Nothing is faster.

    Richard
     
    #12     Dec 4, 2006
  3. Here's what I use. It's about as reasonably low-level as you can go and still allow data storage to be seamlessly transferable to various arches.

    Also check out the new packet-table interface for immediate storage of exceptionally fast streaming data without the need for queuing.

    HDF5

    It's openSource written in C and it's has several other language api-s.

    -kt
     
    #13     Dec 4, 2006
  4. I see they have gotten over the 2GB limit now. Not bad.

    Richard
     
    #14     Dec 4, 2006
  5. cashcow

    cashcow

    There are (within database) APIs available for storing large amounts of data quickly to both MSSQL and Oracle. My experience is that with a decent SCSI drive you can easily get MSSQL to store 3000+ ticks per second.
    More often than not - when developing tick databases it is worth checking if the speed is being limited by the HD - usually it is.
     
    #15     Dec 9, 2006
  6. Yes but then later try to do a select on a certain span of data or request bar history and you'll see the differences in linear databases such as SunGard's data server, RMD Server, HDF5, etc. and RDBMs like SQL Server, Oracle, Access, etc. The problem is in retrieving the data. RDBMs are not optimized for retrieving time series data.

    Richard
     
    #16     Dec 9, 2006
  7. cashcow

    cashcow

    The original question was for "market analysis" - as SQL Server contains analysis tools (especially geared to time-series data) I would still deem it a good choice.

    Had the system geared towards streaming data to clients, then I agree, a linear database would be better suited.
     
    #17     Dec 9, 2006
  8. I would say that if it suitable to your app's runtime efficiency needs, then all the power to ya. If not, then you gotta do what you gotta do.

    -kt
     
    #18     Dec 9, 2006
  9. rosy2

    rosy2

    ktmexc20

    are you using HDF5 for storage and backtesting pruposes or do you use it in realtime? I am looking into using it via python's pytable module. so far it looks good and easy to work with. Have you had any problems so far?
     
    #19     Dec 27, 2006
  10. Hi Rosy,

    are you using HDF5 for storage and backtesting pruposes

    Yes, I use it for all of my data storage needs, for the reasons mentioned in my previous post.

    or do you use it in realtime?

    Yes, that's where it shines. It has, in recent versions, something called "packet tables' for capturing and storing extremely fast streaming data into a formated structure without the need for using queues or the like.

    I am looking into using it via python's pytable module. so far it looks good and easy to work with. Have you had any problems so far?

    I use to use Pytables when I was working in Python and I think it's excellent for it's seamless integration from the HDF5 backend to Python, Numpy, and Matplotlib.. the combination is very good for crunching numbers considering that HDF5 and Numpy are embedded "C". Pytables does not implement that "packet table" interface yet, but you can safely do what you need to do by using queues when capturing streaming data.


    -kt
     
    #20     Dec 28, 2006