Log in or Sign up

ET News & Sponsor Info

General Topics

Markets

Technical Topics

Brokerage Firms

Company Specific

Interactive Brokers

Tools of the Trade

Trading for a Living

Community Lounge

Site Support

Feedback

Why use a database?

Discussion in 'Data Sets and Feeds' started by onelot, Oct 9, 2004.

QuantDeveloper
- 99
  Posts
- 0
  Likes
> I'm building "KDB for the rest of us"

Already done.. called QuantServer and processes 1m ticks per second

www.smartquant.com

Cheers,
Anton

#111 Jul 27, 2005

Share
ktmexc20
- 1,656
  Posts
- 0
  Likes
All due respect Anton, but isn't there any place else to peddle your way over-priced sw?

#112 Jul 27, 2005

Share
Joel Reymont
- 135
  Posts
- 0
  Likes
Anton,

Would you care to elaborate on your technology? An architectural overview would help. Unless of course you were just dumping serialized .NET objects to disk and compressing the data stream. I don't think this would compare to KDB.

Thanks, Joel

Disclaimer: For a year I sold the predecessor to QuantDeveloper and that was all I did. I also looked at the source code and still chat with customers from time to time. There are some long-term disagreements between Anton and I.

Anton: I _specifically_ do not want this thread to degenerate into a "mine is bigger" competition. Please post an overview of your technology for the benefit of everyone.

#113 Jul 27, 2005

Share
CoolTrader
- 499
  Posts
- 15
  Likes
I'm more interested in how to describe chart patterns in computer programs, the time frame of the chart is irrelevant. Could anyone provide some resource including source code on this?

#114 Jul 27, 2005

Share
nononsense
- 4,928
  Posts
- 5
  Likes
Quote from linuxtrader:

Exactly. If you need to ask then you need to hire someone to help you ...... unless you have no money to pay for help in which case you have a lot of work ahead of you .....
More...

That's exactly what I think.
In fact the initial poster correctly talks abould gigabyte sizes. In fact I'm running closer to 100 Gigabytes. I couldn't imagine any rational way of handling and exploiting this store of tickdata WITHOUT a sophisticated database infrastructure. If you try to do this without a database, you're back in the stoneage of computing and you'll end up nuts in reinventing a poor database kludge.

As to 'speed', I honestly don't see the problem. I'm collecting huge amounts of tick data in real time and I'm still very far from reaching the capabilities of my db. Of course, it is easy to write a cripple's piece of software that chokes up. IMHO, it means that you have to think harder.

WITHOUT DATABASE, YOU'LL STAY A LOSER

#115 Jul 27, 2005

Share
QuantDeveloper
- 99
  Posts
- 0
  Likes
>All due respect Anton, but isn't there any place else to peddle your way over-priced sw?

ok, continue discussing open source and 99$ solutions. The right price for a product is the one that market accepts... IMHO

If you think that it's overpriced, the only way to prove this statement is either write it yourself or point to a cheaper alternative.

KDB is in 100K range, QuantServer is in 1K range. Both perform about the same when it comes to market data capture and playback for strategy simulations and historical data requests.

As for underlying technology.. Well KDB writes a large flat binary file with time ordered data records, thus data processing operations go with SCSI/IDE IO speed, no surprise. I guess DateTime search looks like Stream.Seek(...). QuantServer introduces buffering and compression. Underlying technology is not a secret and it's based on root.cern.ch TTree concept. CERN guys write and process terabates of data with Gig/sec incoming load (nuclear events). QuantServer uses similar approach tuned for time series financial data processing. So here it is. No need to discuss which one is bigger (partly because you don't have any at all to start with ) - go and get it for free.

PS. I don't think that Joel's comments are relevant. He has left SmartQuant LTD before we launched QuantServer and QuantDeveloper projects, so "looking into the source code" is somewhat misleading

Regards,
Anton

#116 Jul 27, 2005

Share
trader99
- 2,361
  Posts
- 707
  Likes
Quote from nononsense:

That's exactly what I think.
In fact the initial poster correctly talks abould gigabyte sizes. In fact I'm running closer to 100 Gigabytes. I couldn't imagine any rational way of handling and exploiting this store of tickdata WITHOUT a sophisticated database infrastructure. If you try to do this without a database, you're back in the stoneage of computing and you'll end up nuts in reinventing a poor database kludge.

As to 'speed', I honestly don't see the problem. I'm collecting huge amounts of tick data in real time and I'm still very far from reaching the capabilities of my db. Of course, it is easy to write a cripple's piece of software that chokes up. IMHO, it means that you have to think harder.

WITHOUT DATABASE, YOU'LL STAY A LOSER
More...

nononsense,

Thanks for your informative post. With the risk sounding like a "loser", I'm just learning about DB and Access. Looks cool and reasonable enough. Note: NOT THAT I WOULD use ACCESS for tickdata storage or anything that serious.

I understand all the benefits of db - security, blah blah,etc. What I don't understand is the connection between the db and the backtesting software.

So, don't you still have to pull the data out of the db and store that in some format? an array? Or some complex data structure and populate it? I'm a bit confused. So, one would write SQL commands to pull data then put into a complex data structure then use that to do tick level backtesting?

If you can clarify that would help a lot. Also, doesn't the sql is more interactive prompt at the DB end. But not at the programming language end like VB, C++, Java, python, etc.? Don't one has to use some kind of ADo.net and other DB API?

please help! thanks.

trader99

#117 Nov 30, 2005

Share
Pondracer Guest
- 32
  Posts
- 0
  Likes
I'm working on a system now that stores current data in a SQL Server db and then I use cubes for my archives. Not sure this is the best approach but its what I am familiar with. I'm a developer but this will be my first trading app (personal use only).

#118 Dec 2, 2005

Share
koistya
- 4
  Posts
- 0
  Likes
Quote from QuantDeveloper:

...Underlying technology is not a secret and it's based on root.cern.ch TTree concept. CERN guys write and process terabates of data with Gig/sec incoming load (nuclear events). QuantServer uses similar approach tuned for time series financial data processing.[/B]
More...

From Wikipedia: "Although T-trees seem to be widely used for main-memory databases, recent research indicates that they actually do not perform better than B-trees on modern hardware"

...

Is anyone interested in join this project of a local market data server built on top of the Microsoft SQL Server 2012 and .NET/C++/C# ?

http://github.com/kriasoft/market-data

#119 Dec 9, 2012

Share
PocketChange
- 2,066
  Posts
- 7
  Likes
Just got done architecting an expansive equities tick repository.

Some stats:

Symbols 20,653
Period: 2008 - 2012

Bars25ms 27,741,118,213
Bars1Sec 14,007,345,833
Bars1Min 2,141,219,516
Messages 742,640,774,253
Ask Changes 24,825,915,500
Bid Changes 24,722,845,608
Orders 37,906,709,939
Volume 10,485,098,567,764

Dedicated Servers: 5
Data Storage: 20TB

We chose a hybrid Hadoop style implementation with SQL access.
Being I/O bound was an understatement.

We are now able to locate and access any tick of any instrument nearly instantaneously (<10ms). The data is stored multiple times using different optimizations for accelerating performance.

Different Structures are used for pairs analysis, graphing bars, index analysis etc. Extensive Use of Covering Indexes (where the index contains the answer data).

One of our driving forces to build out this data repository was that the consolidated data commercially available was fundamentally flawed being built around last trade data. Exchange Tape data is too slow to process for most of our algos.

We build out our bars differently using ask/bid changes as the trigger and not last trade data. Consequently our back tested results nearly match our real time executions. This is especially true when trading pairs and other cross exchange correlated instruments.

We're contemplating making access to these structures available as a service... Renting out VM's with direct access to our 20TB repository... Send me a PM If your interested.

#120 Dec 9, 2012

Share

(You must log in or sign up to reply here.)

Search