Tick Database Implementations

Discussion in 'Data Sets and Feeds' started by fatrat, Nov 25, 2006.

  1. The HDF Group Home Page

    Hi. Just to enlighten of alternatives
    and share my streaming data (capturing) and db tool preferences.
    If you have any questions, feel free to ask.
    Naturally, it's open source software.

    -kt

    Introduction to the HDF5 Packet Table API (more info)

    This image is an example of variable length packet capturing.
    [​IMG]
    Boeing's Flight Test Instrumentations Group and the HDF5 development group at the University of Illinois have developed a library that is particularly suited for "packet" data, data that arrives in streams of packets from instruments at potentially very high speeds.

    The proliferation of sensors and other instruments introduces enormous challenges to data management. Even for a single event, incoming synchronized time-sequenced data can have many sources, and the number of incoming data streams, as well as the types of data, can be large. In Boeing's flight test data applications, for instance, data arrives from test aircraft, voice communications, video, ground, satellite tracking, and other sources. This data must be gathered, integrated, processed, visualized, and archived. Similar scenarios exist for many different applications, such as environmental monitoring, vehicle testing, and medicine.

    The collection and storing of these kinds of data historically have been reduced to unique in-house implementations. There is surprisingly little sharing of these infrastructure technologies even within an application area, let alone across application domains, resulting in frequent and costly re-invention of the same technologies.

    HDF5 provides, in a single package, many of the capabilities that otherwise have to be developed from scratch. HDF5 can store virtually any kind of scientific or engineering data and to mix any number of objects of different types in a single container. HDF5 can support different access patterns, simplified data integration, datatype translation, fast I/O, and visualization and analysis software.
     
    #41     Jan 10, 2007
  2. nitro

    nitro

    ktmexc2,

    While I absolutely love HDF5, and HDF5 is moving in the direction of implementing a stream database, it is in it's current implementation nowhere near a streamdb.

    For example, the most obvious omission is streamSQL.

    nitro
     
    #42     Jan 10, 2007
  3. Hi nitro,

    I have never used an sql (relational db) and have only briefly touched upon learning it. From my layman's point of view, I would rather do queries in the native programming language and not some query language tied to the db.

    From what I understand, this type of db is unnecessary to my work. I could possibly understand it for enterprise use, I guess. I just don't see yet, the advantages in a relational db at the cost of run time efficiency.

    Mind you that Hdf5 does internally support storage reference structures, iterators, partial I/O, point/hyperslab selection and parallel implementation.

    So are saying that, just because it's not tied to an sql, it's not suitable? I don't understand.

    Thanks,
    kt
     
    #43     Jan 10, 2007
  4. I guess nitro hasn't been able to make his way back here to help me with my understanding.

    Can any of you help me to understand nitro's comment. Hopefully he'll also chime in when he's able to.

    Thank you,
    kt
     
    #44     Jan 11, 2007
  5. schan

    schan

    Thanks kt and rosy for mentioning hdf5 and pytable because it is just awesome.

    On a laptop and slow hard drive, using python, I'm writing 1 million tick messages (date,time,price,size,total,bid,ask) in 4 seconds, with a compressed size of 3.6 Megs (<4bytes per row!). Retrieval time is a non-issue.

    I won't be able to update this fast with with SQL. Maybe with an API.

    Perhaps SQL and StreamSQL is nice when formulating more complex queries and joining tables.

    This paper has some examples: http://nms.csail.mit.edu/~stavros/pubs/osfa.pdf

    However for me, backtesting requires just reading a slice on one table and looping through the trades. I used a flat file for fast read and writes, but hdf5 is faster and better.

    One does on-the-fly split adjustment; another forwarding first arriver.
     
    #45     Jan 11, 2007
  6. Why does it need to be a streamdb?

    Could a combination of HDF5 and Esper:

    http://esper.codehaus.org/

    be a suitable solution? Esper uses EQL which although is not streamSQL presumably has some semantic similarities.

    HDF for storage. Esper for stream analysis. All bases covered?
     
    #46     Jan 11, 2007
  7. nitro

    nitro

    Esper _is_ an an attempt at a stream database...EQL I believe is very close if not implements the entire Streaming SQL standard.

    I know of Esper, and it is a perfectly good solution, were it not written in Java. But I don't want to get into a computer programming language war...


    ktmex, I will get to your question later...

    nitro
     
    #47     Jan 11, 2007
  8. Nitro knows more than I, but I don't see how the hdf5 packet table interface would not be considered an efficient stream db... minus the sql syntax, which I don't yet understand as being so beneficial.

    Hell, since it's in use with Boeing systems and they helped the NCSA* with it's design, you would think that it's pretty much top notch. I would guess.


    [*] National Center for Super Computing Applications.
     
    #48     Jan 11, 2007
  9. For the system I am working on my thinking was to pass the various event streams through Esper for ESP needs such as feed monitoring, indicator construction, book analysis etc. and then persist relevant streams using HDF5. The event streams could then be played back from the HDF5 storage.

    As far as I know Esper doesn't concern itself with persistence matters hence I saw the two products being complimentary at this juncture. Perhaps they are both converging and eventually they will have overlapping functionality.

    No need to get into a language debate, you can just state the facts as you see them for why, in this instance, Java is not suitable for this purpose. I have no other way of knowing :D
     
    #49     Jan 11, 2007
  10. I understand, at least superficially, the huge potential and benefits of having streaming SQL abilities and I'm quite excited about incorporating it into my event-based ATS. However, I believe it's actually answering a different but related question to the one this thread was discussing :confused:

    Agree, Boeing's involvement does lend a great deal of credibility to the product. I like what I see so far...
     
    #50     Jan 11, 2007