Why use a database?

Discussion in 'Data Sets and Feeds' started by onelot, Oct 9, 2004.

  1. > I'm building "KDB for the rest of us"

    Already done.. called QuantServer and processes 1m ticks per second :)

    www.smartquant.com

    Cheers,
    Anton
     
    #111     Jul 27, 2005
  2. All due respect Anton, but isn't there any place else to peddle your way over-priced sw?
     
    #112     Jul 27, 2005
  3. Anton,

    Would you care to elaborate on your technology? An architectural overview would help. Unless of course you were just dumping serialized .NET objects to disk and compressing the data stream. I don't think this would compare to KDB.

    Thanks, Joel

    Disclaimer: For a year I sold the predecessor to QuantDeveloper and that was all I did. I also looked at the source code and still chat with customers from time to time. There are some long-term disagreements between Anton and I.

    Anton: I _specifically_ do not want this thread to degenerate into a "mine is bigger" competition. Please post an overview of your technology for the benefit of everyone.
     
    #113     Jul 27, 2005
  4. I'm more interested in how to describe chart patterns in computer programs, the time frame of the chart is irrelevant. Could anyone provide some resource including source code on this?
     
    #114     Jul 27, 2005
  5. That's exactly what I think.
    In fact the initial poster correctly talks abould gigabyte sizes. In fact I'm running closer to 100 Gigabytes. I couldn't imagine any rational way of handling and exploiting this store of tickdata WITHOUT a sophisticated database infrastructure. If you try to do this without a database, you're back in the stoneage of computing and you'll end up nuts in reinventing a poor database kludge.

    As to 'speed', I honestly don't see the problem. I'm collecting huge amounts of tick data in real time and I'm still very far from reaching the capabilities of my db. Of course, it is easy to write a cripple's piece of software that chokes up. IMHO, it means that you have to think harder.

    WITHOUT DATABASE, YOU'LL STAY A LOSER
     
    #115     Jul 27, 2005
  6. >All due respect Anton, but isn't there any place else to peddle your way over-priced sw?

    ok, continue discussing open source and 99$ solutions. The right price for a product is the one that market accepts... IMHO :)

    If you think that it's overpriced, the only way to prove this statement is either write it yourself or point to a cheaper alternative.

    KDB is in 100K range, QuantServer is in 1K range. Both perform about the same when it comes to market data capture and playback for strategy simulations and historical data requests.

    As for underlying technology.. Well KDB writes a large flat binary file with time ordered data records, thus data processing operations go with SCSI/IDE IO speed, no surprise. I guess DateTime search looks like Stream.Seek(...). QuantServer introduces buffering and compression. Underlying technology is not a secret and it's based on root.cern.ch TTree concept. CERN guys write and process terabates of data with Gig/sec incoming load (nuclear events). QuantServer uses similar approach tuned for time series financial data processing. So here it is. No need to discuss which one is bigger (partly because you don't have any at all to start with :)) - go and get it for free.

    PS. I don't think that Joel's comments are relevant. He has left SmartQuant LTD before we launched QuantServer and QuantDeveloper projects, so "looking into the source code" is somewhat misleading :p

    Regards,
    Anton
     
    #116     Jul 27, 2005
  7. trader99

    trader99

    nononsense,

    Thanks for your informative post. With the risk sounding like a "loser", I'm just learning about DB and Access. Looks cool and reasonable enough. Note: NOT THAT I WOULD use ACCESS for tickdata storage or anything that serious.

    I understand all the benefits of db - security, blah blah,etc. What I don't understand is the connection between the db and the backtesting software.

    So, don't you still have to pull the data out of the db and store that in some format? an array? Or some complex data structure and populate it? I'm a bit confused. So, one would write SQL commands to pull data then put into a complex data structure then use that to do tick level backtesting?

    If you can clarify that would help a lot. Also, doesn't the sql is more interactive prompt at the DB end. But not at the programming language end like VB, C++, Java, python, etc.? Don't one has to use some kind of ADo.net and other DB API?

    please help! thanks.

    trader99
     
    #117     Nov 30, 2005
  8. Pondracer

    Pondracer Guest

    I'm working on a system now that stores current data in a SQL Server db and then I use cubes for my archives. Not sure this is the best approach but its what I am familiar with. I'm a developer but this will be my first trading app (personal use only).
     
    #118     Dec 2, 2005
  9. koistya

    koistya

    From Wikipedia: "Although T-trees seem to be widely used for main-memory databases, recent research indicates that they actually do not perform better than B-trees on modern hardware"

    ...

    Is anyone interested in join this project of a local market data server built on top of the Microsoft SQL Server 2012 and .NET/C++/C# ?

    http://github.com/kriasoft/market-data
     
    #119     Dec 9, 2012
  10. Just got done architecting an expansive equities tick repository.

    Some stats:

    Symbols 20,653
    Period: 2008 - 2012

    Bars25ms 27,741,118,213
    Bars1Sec 14,007,345,833
    Bars1Min 2,141,219,516
    Messages 742,640,774,253
    Ask Changes 24,825,915,500
    Bid Changes 24,722,845,608
    Orders 37,906,709,939
    Volume 10,485,098,567,764

    Dedicated Servers: 5
    Data Storage: 20TB

    We chose a hybrid Hadoop style implementation with SQL access.
    Being I/O bound was an understatement.

    We are now able to locate and access any tick of any instrument nearly instantaneously (<10ms). The data is stored multiple times using different optimizations for accelerating performance.

    Different Structures are used for pairs analysis, graphing bars, index analysis etc. Extensive Use of Covering Indexes (where the index contains the answer data).

    One of our driving forces to build out this data repository was that the consolidated data commercially available was fundamentally flawed being built around last trade data. Exchange Tape data is too slow to process for most of our algos.

    We build out our bars differently using ask/bid changes as the trigger and not last trade data. Consequently our back tested results nearly match our real time executions. This is especially true when trading pairs and other cross exchange correlated instruments.

    We're contemplating making access to these structures available as a service... Renting out VM's with direct access to our 20TB repository... Send me a PM If your interested.
     
    #120     Dec 9, 2012