Log in or Sign up

ET News & Sponsor Info

General Topics

Markets

Technical Topics

Brokerage Firms

Company Specific

Interactive Brokers

Tools of the Trade

Trading for a Living

Community Lounge

Site Support

Feedback

Why use a database?

Discussion in 'Data Sets and Feeds' started by onelot, Oct 9, 2004.

Kingvest
- 38
  Posts
- 0
  Likes
Quote from misctrader:

so, what is the best way to store, retrieve
high frequency minute or tick data for backtesting and doing analytics on it?

through a flat file with c++

or

through a db?

which way is faster?
More...

well, high frequency and minute data is quite controversial .. isn't it?
But this really depends on the amount of data you deal with and what kind of analytics you want to run

#21 Oct 14, 2004

Share
lastick
- 84
  Posts
- 0
  Likes
Quote from misctrader:

so, what is the best way to store, retrieve
high frequency minute or tick data for backtesting and doing analytics on it?

through a flat file with c++

or

through a db?

which way is faster?
More...

C++ flat binary file by far. Less code optimized for speed and no additionnal lib or dll to implement and all the stuff.
But it's only a small part of the applications(graphs, systems, ATS...).

#22 Oct 14, 2004

Share
billgates
- 193
  Posts
- 0
  Likes
Quote from kc11415:

... This would require calculating 250,000 time-series correlations ... Very few institutionals will invest what is required for a strict calculation of beta.
More...

250K * 252 days * 10 years ~= 1B arithmetical operations. My $60 Celeron will do it in few seconds. Any institutional interested in leasing processing power from me, please PM.

#23 Oct 14, 2004

Share
linuxtrader Guest
- 255
  Posts
- 0
  Likes
Quote from misctrader:

so, what is the best way to store, retrieve
high frequency minute or tick data for backtesting and doing analytics on it?

through a flat file with c++

or

through a db?

which way is faster?
More...

Without any other qualification C++ and flat file: this of course assumes that you dont screw this up and use the proper in-memory structures for whatever calculations you need and that you dont corrupt these structures.

I have worked with several large funds and firms that have had their people screw up the application of this idea in fairly spectacular ways ........ these were people that made it sound like this idea would lead to an immediate and cheap integration solution to large performance problems .... boy were they wrong and needless to say, the folks that messed up are unemployed. I think they read about this idea off some online board and thought it was a no-brainer ....... In large real world systems the application of this idea is not as straightforward as it appears.....

#24 Oct 14, 2004

Share
Gringinho
- 3,572
  Posts
- 0
  Likes
The main reason for applying a database or perhaps another transaction-oriented system is safety.
Atomicity of transactions, rollbacks, audits are the only reason for applying database to this type of "seldomly accessed data".

Otherwise, having it in memory and flat files are the best thing - for performance and ease of use, administration.

#25 Oct 14, 2004

Share
misctrader
- 570
  Posts
- 0
  Likes
Quote from lastick:

C++ flat binary file by far. Less code optimized for speed and no additionnal lib or dll to implement and all the stuff.
But it's only a small part of the applications(graphs, systems, ATS...).
More...

So, I guess a backtester written in C++/Java/C# to read flat files that contains minutely or tick data is the fastest? Hmm..

But I suppose you'll end up writing a lot of read/write infrastructure and time series class libs. Calendar awareness,etc. vs using off the shelf stuff like Tradestation , which is a joke.

just wondering..

#26 Oct 14, 2004

Share
kc11415
- 84
  Posts
- 0
  Likes
kc11415>... This would require calculating 250,000 time-series correlations ... Very few institutionals will invest what is required for a strict calculation of beta.

billgates>250K * 252 days * 10 years ~= 1B arithmetical operations. My $60 Celeron will do it in few seconds. Any institutional interested in leasing processing power from me, please PM.

So does that constitute an offer to make available a service to calculate true beta for all components of the S&P500 in near real-time for $60? Since it will only take a few seconds for you to run, how about you test your performance claim and tell us what you actually measure for the run-time. After all, what institutional is going to buy your service if you can't demonstrate it? ;-)

#27 Oct 14, 2004

Share
marist89
- 8
  Posts
- 0
  Likes
Boy you guys like to write a lot of code. While you have to write 100's of lines of code, I have to SELECT * FROM tickData WHERE sample_dt > SYSDATE-10. Good luck to you.

#28 Oct 14, 2004

Share
CoolTrader
- 499
  Posts
- 15
  Likes
Quote from misctrader:

So, I guess a backtester written in C++/Java/C# to read flat files that contains minutely or tick data is the fastest? Hmm..

But I suppose you'll end up writing a lot of read/write infrastructure and time series class libs. Calendar awareness,etc. vs using off the shelf stuff like Tradestation , which is a joke.

just wondering..
More...

I agree. Using a DB with exsiting class library like ADO in .net is the easiest to program, certainly not as efficient as flat files. With flat files, you have to write your own class library to manipulate it.

#29 Oct 14, 2004

Share
linuxtrader Guest
- 255
  Posts
- 0
  Likes
Quote from marist89:

Boy you guys like to write a lot of code. While you have to write 100's of lines of code, I have to SELECT * FROM tickData WHERE sample_dt > SYSDATE-10. Good luck to you.
More...

Exactly: If you dont need to make things difficult then dont. ....... Most applications can just use a database and SQL statements to access the data...... If the app cant use database acccess code then there is more work to do ....

#30 Oct 14, 2004

Share

(You must log in or sign up to reply here.)

Search