Tick Database Implementations

Discussion in 'Data Sets and Feeds' started by fatrat, Nov 25, 2006.

  1. fatrat

    fatrat

    I'm curious what automated system traders who write in C/C++ do for storing their tick data for market analysis. After having written much code for hedge fund automated trading systems for quants, I'm attempting to duplicate (author from scratch) what I've seen for my own benefit. The hedge fund that I worked for didn't actually maintain a tick-database. They used conventional datastructures to make markets. This led to headaches, and I want to attempt a slightly different implementation.

    I'm not a DBA, but I set up SQL Server 2005 with a series of stored procs for storing market information. Databases are slow, so I created a quote- and tick-posting queue to which market-data threads asynchronously post their information, and the worker threads commit them to the DB later. The software has a sort of sliding window that serves as a short-term cache for recent tick-data so lookups for past tick-data don't hit the database directly. My models that are highly-dependent on short-term data don't suffer performance penalties when reaching back for historical data, provided it fits in the sliding window.

    I modeled the tick-data management system like a processor cache, with the database being the equivalent of a main-memory. So far so good, and my program was able to keep up with the NASDAQ's QQQQ, tick for tick and order-add/order-remove for the Level-2 books.

    There's room for improvement, however. I have not added more products aside from QQQQ for my test. I'm trying to address scalability issues for handling multiple products. Rather than reinvent the wheel and spend considerable time and energy implementing a robust tick database, I'd like to hear what other alternatives people are using.

    Are you guys using any 3rd party solutions for tick-databases? Do you have in-memory database solutions? What solutions have you come up with to: 1) minimize latency, and 2) provide efficient access to historical data without significant overhead.

    In addition, are any of you willing to provide efficient DB schema for the storage of ticks, or are you aware of freely available schema to help improve tick db performance? Do you separate databases for individual products? Do you throw all products into one database? Do you run your modeling software on a different system than your database server? What is your network topology with regard to the database server?

    Where have your bottlenecks come from in the past? In the event of a database server failure, how do you deal with the situation? Do you utilize redundancy for data-storage in case of a failure?
     
  2. rosy2

    rosy2

    usable tick databases/storage is a problem everywhere. I know some places are using http://kx.com/ for faster access. Some firms capture everything in flat files then load to a non-relational database. How will you deal with corprate actions?
     
  3. caching recent ticks in memory is a good approach. maybe try compressing historical data and store it on the file system. SQL isn't optimal for time series...
     
  4. Any suggestions for optimal file system arrangements?

    When you say compressing historical data do you mean compressing ticks into bars for example or do you mean zip style data compression?
     
  5. fatrat

    fatrat

    He means data compression.
     
  6. fatrat

    fatrat

    I saw their product offering, but wasn't amused with the idea that they were using .NET. Not that I have anything against .NET -- I love .NET. I just don't think that's the appropriate framework for the absolute lowest level components of a trading system.

    I'm amused that so many software development departments around here on Wall St. are switching to .NET. It'll save them many headaches at the application level, but I question the employers who claim to be "real-time" while using reflection and garbage collection in user-mode processes.
     
  7. FWIW, Valdis is the producer of said QuantDeveloper software. The documentation linked to mentioned that reflection was avoided.

    What framework/languages do you think ARE appropriate for the absolute lowest level components of a trading system?
     
  8. fatrat

    fatrat

    Yes, I realized they dropped reflection. I wasn't making a reference to Valdis, just some of the firms who claim to be "real-time", when they know that they're never going to guarantee a hard real-time or even a reasonable soft real-time response if a garbage collection happens to be scheduled somewhere along the way.

    Back in the day, we had a large debate in one of the companies I worked for regarding the performance of .NET applications and C++ or C applications for communications. We knew that the .NET framework (1.1) at the time was built overtop of the NT-kernel's I/O completion ports framework. Our measurements in 2003-2004 showed that it did a very good job, but in terms of both size/speed, a skilled C/C++ programmer would always beat it out.

    We went with .NET anyway, because the cost in terms of time to develop applications was significantly lower. I didn't mind the decision, and the way the application was built, we could easily throw more hardware at the problem.

    My performance-driven approach is always to write a very rock-solid "kernel" of sorts in C/C++ and create a layered approach. When there are applications, like for example data visualization, that don't require the fastest performance, I'll usually use COM or DCOM to talk to .NET components.

    The reason I like the Microsoft platform as opposed to Linux is because of the wide array of choices I have in terms of breaking the system up. I can write the fastest parts as device drivers if I wanted to, but then write the GUI and high level stuff in C# and retrieve data through COM components.

    That's my preference, anyway.
     
  9. fatrat

    fatrat

    Clarification here -- when I say device drivers, I mean kernel-code. I used the term device driver because it's the easiest way to get something to run inside kernel-space.
     
    #10     Nov 27, 2006