Database Synchronization

Discussion in 'Data Sets and Feeds' started by ET151, Jul 6, 2010.

  1. ET151


    I am currently logging tick data into binary files on one computer (Computer A). But I am looking for a database to store the data on and furthermore, I want to be able to query Computer A to backfill my charting software on another computer, Computer B. After backfilling, I then want Computer A to relay all received ticks relevant to the instrument(s) being monitored by Computer B to be forwarded to Computer B. I know that it's not a good idea to relay data for a true automated HFT system. However, I am not doing HFT and that latency should be ok for now, but I'd like to keep it at a minimum. I am using Linux for both systems. Does anyone know of a good open-source database solution and method for relaying the ticks? Would master-slave database replication be the way to go? At this point, my database would be not much larger than a couple GBs, I could flush the database to binary files at the end of each week to keep it small if necessary.
  2. Are you looking to do this realtime?

    Two solutions come to mind. First, run RAID1 (mirror) in a single computer (so you have things backed up) and then create a shared folder/mapped network drive and point Computer B to the shared folder. You would essentially be sharing the information.

    Second, get a NAS (cheap $100 D-Link, etc.) run RAID1 on that (mirror) so that its backed up and redundant, then point both Computer A and Computer B to the NAS folders.
  3. ET151


    That would handle the backup and data duplication, but how would computer B know when new data is available? I am running a charting application on Computer B and it needs to be notified when new data is available to pull...or else it just checks continuously in an infinite-loop which would be a bit of a waste of system resources. I would like to use an event-listener to notify Computer B of new data.

    If I was going to do this from scratch, I would use a concurrent DB that supports single writer, multiple reader (SWMR) and Sunday night when I start my data logging on Comp A, before Comp B is started, the data goes into the database. Monday morning I turn on Computer B, connect to the tick server process running on Comp A and then subscribe to the instruments that I am interested in. After subscribing, computer A immediately notifies Comp B that it has data up to index xxxxxxx. Comp B then retrieves this data from Computer A. Comp B pulls the data either directly out of a copy of Comp A's DB on RAID, as you suggested, or I use master-slave config and sync to have A synchronize a duplicate DB on Comp B...I think your idea is better because it requires less computational overhead for Comp A -- RAID will not consume any more CPU / memory resources on Comp A over what I have now. This event-driven protocol is basically the same model that my data provider uses.

    What's left to be figured out is how should the event listener be implemented? Can I leverage open-source software already in existence? If I had to do this part from scratch, I would register an event listener with my tick logger and it would notify Computer B whenever new data is available...not too complicated, but does anyone see a better way? Is there some facility built into databases which can provide the notification upon update?
  4. If you used a shared network drive and a shared database such as MSFT Access you can either manually refresh, write a script to auto-refresh or just let your charts refresh when you open a new one.

    This is very easy - PM me with some other type of contact and I'll be happy to walk you through this. 10min and it should be all set - if you can pull/record data you should be able to figure this out.
  5. ET151