Database Synchronization

ET151 · Jul 6, 2010

I am currently logging tick data into binary files on one computer (Computer A). But I am looking for a database to store the data on and furthermore, I want to be able to query Computer A to backfill my charting software on another computer, Computer B. After backfilling, I then want Computer A to relay all received ticks relevant to the instrument(s) being monitored by Computer B to be forwarded to Computer B. I know that it's not a good idea to relay data for a true automated HFT system. However, I am not doing HFT and that latency should be ok for now, but I'd like to keep it at a minimum. I am using Linux for both systems. Does anyone know of a good open-source database solution and method for relaying the ticks? Would master-slave database replication be the way to go? At this point, my database would be not much larger than a couple GBs, I could flush the database to binary files at the end of each week to keep it small if necessary.

WinstonTJ · Jul 6, 2010

Are you looking to do this realtime?

Two solutions come to mind. First, run RAID1 (mirror) in a single computer (so you have things backed up) and then create a shared folder/mapped network drive and point Computer B to the shared folder. You would essentially be sharing the information.

Second, get a NAS (cheap $100 D-Link, etc.) run RAID1 on that (mirror) so that its backed up and redundant, then point both Computer A and Computer B to the NAS folders.

ET151 · Jul 6, 2010

That would handle the backup and data duplication, but how would computer B know when new data is available? I am running a charting application on Computer B and it needs to be notified when new data is available to pull...or else it just checks continuously in an infinite-loop which would be a bit of a waste of system resources. I would like to use an event-listener to notify Computer B of new data.

If I was going to do this from scratch, I would use a concurrent DB that supports single writer, multiple reader (SWMR) and Sunday night when I start my data logging on Comp A, before Comp B is started, the data goes into the database. Monday morning I turn on Computer B, connect to the tick server process running on Comp A and then subscribe to the instruments that I am interested in. After subscribing, computer A immediately notifies Comp B that it has data up to index xxxxxxx. Comp B then retrieves this data from Computer A. Comp B pulls the data either directly out of a copy of Comp A's DB on RAID, as you suggested, or I use master-slave config and sync to have A synchronize a duplicate DB on Comp B...I think your idea is better because it requires less computational overhead for Comp A -- RAID will not consume any more CPU / memory resources on Comp A over what I have now. This event-driven protocol is basically the same model that my data provider uses.

What's left to be figured out is how should the event listener be implemented? Can I leverage open-source software already in existence? If I had to do this part from scratch, I would register an event listener with my tick logger and it would notify Computer B whenever new data is available...not too complicated, but does anyone see a better way? Is there some facility built into databases which can provide the notification upon update?

WinstonTJ · Jul 6, 2010

If you used a shared network drive and a shared database such as MSFT Access you can either manually refresh, write a script to auto-refresh or just let your charts refresh when you open a new one.

This is very easy - PM me with some other type of contact and I'll be happy to walk you through this. 10min and it should be all set - if you can pull/record data you should be able to figure this out.

ET151 · Jul 10, 2010

I think this might be a better approach - Tokyo Cabinet:

http://www.youtube.com/watch?v=2k1J7Vn4EDg

Log in or Sign up

Database Synchronization

ET151

WinstonTJ

ET151

WinstonTJ

ET151