Log in or Sign up

ET News & Sponsor Info

General Topics

Markets

Technical Topics

Brokerage Firms

Company Specific

Interactive Brokers

Tools of the Trade

Trading for a Living

Community Lounge

Site Support

Feedback

Why use a database?

Discussion in 'Data Sets and Feeds' started by onelot, Oct 9, 2004.

choppystride
- 24
  Posts
- 0
  Likes
Quote from Gringinho:

The main reason for applying a database or perhaps another transaction-oriented system is safety.
Atomicity of transactions, rollbacks, audits are the only reason for applying database to this type of "seldomly accessed data".

Otherwise, having it in memory and flat files are the best thing - for performance and ease of use, administration.
More...

I agree, but I think that applies mostly to a corporate team-oriented environment. (unless of course if you have a personal super duper parallel system that receives multiple price feeds and need to evaluate & execute ultra short term strategies in realtime )

If you're just a lone ranger, I think the most important consideration in deciding whether you need a db is your querying needs - that is, if you need to selectively read in only chunks of your data at a time. That's usually due to:

1) Lack of main memory:

e.g. you want to evaluate a strategy on intraday data that spans several years and your memory can only store one week's worth of data at a time.

2) The nature of your strategy requires you to select subsets of data at a time

e.g. your strategy involves choosing only the top 100 most active stocks from the Russell 3K each day.

Otherwise, like Gringinho said, I think the easiest and fastest way to go is to store your data in flat files and then stuff everything into main memory when you need to work with them.

#31 Oct 14, 2004

Share
misctrader
- 570
  Posts
- 0
  Likes
Quote from CoolTrader:

I agree. Using a DB with exsiting class library like ADO in .net is the easiest to program, certainly not as efficient as flat files. With flat files, you have to write your own class library to manipulate it.
More...

I would rather use some form of Db access class like ADO.Net than write tons of functiosn to manipulate flat files. ACK!

That woudl mean you gotta move back and forth in the file for backtesting purposes with dates etc. geez.

I have not used ADO before. But is it true to say if you want to grab mm1/dd1/yyyy1 to mm2/dd2/yyyy2 would be trivial in ADO once you have stored your data in date OHLC format>

#32 Oct 14, 2004

Share
Gringinho
- 3,572
  Posts
- 0
  Likes
Quote from misctrader:

I would rather use some form of Db access class like ADO.Net than write tons of functiosn to manipulate flat files. ACK!

That woudl mean you gotta move back and forth in the file for backtesting purposes with dates etc. geez.

I have not used ADO before. But is it true to say if you want to grab mm1/dd1/yyyy1 to mm2/dd2/yyyy2 would be trivial in ADO once you have stored your data in date OHLC format>
More...

I strongly believe you actually mean that you would hold any data you would test or manipulate - in memory - and NOT on file or database for individual record updates (yuck!!).

So the manipulation bit get's fairly equal, while the storage part is different. Well, that is if you don't need transactional security; do you ?!?!

#33 Oct 14, 2004

Share
linuxtrader Guest
- 255
  Posts
- 0
  Likes
Quote from Gringinho:

The main reason for applying a database or perhaps another transaction-oriented system is safety.
Atomicity of transactions, rollbacks, audits are the only reason for applying database to this type of "seldomly accessed data".

Otherwise, having it in memory and flat files are the best thing - for performance and ease of use, administration.
More...

Safety .... Well yes but.... It sure is easy to just run a script to import data into a database system and then, once it is there declare who can access the data. Pretty simple to back up too. Having everything in one central and secure place is a lot easier in my opinion than manipulating many seperate files .... just my opinion.

... And by the way, lots of database reporting achitectures exist to house "seldomly accessed data" if the data itself and any enhancements made to the data are of value .......

In my development I only work with data loads to memory outside of a database system if there is no other choice due to performance issues: Always the preferred solution is centralized data storage in a modern database system and unless your app is just a hobby for one person you wont find many people happy with a solution based on flat files - unless there is no other choice due to budget or technical issues.

#34 Oct 14, 2004

Share
Gringinho
- 3,572
  Posts
- 0
  Likes
Quote from linuxtrader:

...
And by the way, lots of database reporting achitectures exist to house "seldomly accessed data" if the data itself and any enhancements made to the data are of value .......

More...

Sure, IBM DB2 Universal Server and data-mining addons are well suited e.g.

Proper backups and storage are of course the same responsible attitude required for all types of IT work - regardless of StorageWorks libs, tape-robots, CD-RW, whatever.

In my development I only work with data loads to memory outside of a database system if there is no other choice due to performance issues: Always the preferred solution is centralized data storage in a modern database system and unless your app is just a hobby for one person you wont find many people happy with a solution based on flat files - unless there is no other choice due to budget or technical issues.
More...

I think this is the preferred way for all institutional, corporate users. For a one-man-show operation, I think running and updating an advanced DB solution reuquires a lot of work and a lot of DBA knowledge. I used to admin a VoIP call-record Oracle 8i db on SunOS, as well as a Oracle-based backoffice system for a travel agency network on HP-UX. It was a lot of data, batch-working and not at all trivial. I spent a lot of time and effort on getting the right performance out of the systems, using striped volumes and all kinds of platform tweaks and patches, not to mention all the SQL-hints and tweaking of indexes needed as data gets bloated.

My experience tells me that I will not do the same for a personal trading system - not in a million years. Those who rely on professional consultants and need the other DB-benefits could of course do it, but it's just too much overkill for simple backtesting and strategies for a single trader.

#35 Oct 14, 2004

Share
linuxtrader Guest
- 255
  Posts
- 0
  Likes
Quote from Gringinho:

Sure, IBM DB2 Universal Server and data-mining addons are well suited e.g.

Proper backups and storage are of course the same responsible attitude required for all types of IT work - regardless of StorageWorks libs, tape-robots, CD-RW, whatever.

I think this is the preferred way for all institutional, corporate users. For a one-man-show operation, I think running and updating an advanced DB solution reuquires a lot of work and a lot of DBA knowledge. ... It was a lot of data, batch-working and not at all trivial. I spent a lot of time and effort on getting the right performance out of the systems, using striped volumes and all kinds of platform tweaks and patches, not to mention all the SQL-hints and tweaking of indexes needed as data gets bloated.

...
More...

Well ... Perhaps.... Even for a lone trader that works with a lot of strategies and who knows how to deal with sql it should not be problem, even with 100's of Gbytes of data. Most kids out of college can manage it with todays tools/systems. Just depends upon what you want to do...... and where you want to go. If all you ever want to do is work by yourself than anything is OK. And I agree, working with a database can be a bottomless pit of time if you dont know what you are doing or if the people that designed it didn't know what they were doing ....

#36 Oct 14, 2004

Share
Sparohok
- 1,124
  Posts
- 2
  Likes
Quote from kc11415:
I think this depends upon your definition of "time series data." If you are referring to one OHLC bar per day per instrument, then I could see the merits of your argument. If you mean to imply Intraday time-series data, perhaps down to the level of resolution of each tick in the Time And Sales data, then I'd disagree in most cases.
More...

Tick data is hard to work with efficiently no matter what data structure you use.

IMHO, the only reason to stop using a transactional database as your backing store is if the total size of the database gets out of hand. So it depends on time resolution, number of symbols and how much history you want, and how much you want to spend on disks and backup media. So I guess we're back to "it depends on the application."

Quote from kc11415:
For some context, consider that strict interpretation of modern portfolio theory would require that beta for each stock in the S&P500 be calculated by correlating against all 499 of the other stocks. This would require calculating 250,000 time-series correlations. The common short-cut is to calculate beta as the correlation between each stock's time-series and just the index itself. Very few institutionals will invest what is required for a strict calculation of beta.
More...

It's not a matter of computing power, it's a matter of statistical validity. The institutionals do not calculate the covariance matrix of the S&P 500 because it would not be statistically valid. There aren't enough degrees of freedom. Using more history gives you better statistical validity but worse predictive value since markets change over time. Roughly speaking, you would need 10,000 data points per stock for a valid covariance matrix of the S&P 500. That's almost 40 years of daily data.

Martin

#37 Oct 14, 2004

Share
Gringinho
- 3,572
  Posts
- 0
  Likes
Quote from Sparohok:

Tick data is hard to work with efficiently no matter what data structure you use.

IMHO, the only reason to stop using a transactional database as your backing store is if the total size of the database gets out of hand. So it depends on time resolution, number of symbols and how much history you want, and how much you want to spend on disks and backup media. So I guess we're back to "it depends on the application."
...
Martin
More...

You're so right!
I forgot to mention that I always think about tick-data and products like ES, EuroFX etc., but of course 1-min, 5-min or larger OHLC-data will not be so much of a challenge. Good point.

OTOH, if you talk 20k symbols or something, you're getting back in there .. just like with tick-data.

#38 Oct 14, 2004

Share
billgates
- 193
  Posts
- 0
  Likes
From my journal:

"
------------------------------------------------------------------------
11 Nov 03
------------------------------------------------------------------------
Testing MySQL.

Populated with 101M intraday records. Time 5.3 hours, disk space 4.3G, index 2.4G.
Query to select 22000 records - 300 sec.
Query to select 1600 records - 3 sec.
Added 2 indexes ('open' and 'price') - 4 hours, 4.3G disk space.
"SELECT * FROM `rtquotes` where date='2003-11-06' and open>200 and open < 210" - 300 sec

Conclusion: not suitable (too slow)
"

Same query would cost me only about a second with flat files.

#39 Oct 14, 2004

Share
prophet
- 449
  Posts
- 27
  Likes
Quote from linuxtrader:
...Loading everything into memory can work - or not.
More...

Effective caching in memory will ALWAYS help.

More often than not the code and the design is the problem.
More...

Yes. See this post for information on effective algorithmic improvements:
http://www.elitetrader.com/vb/showthread.php?s=&postid=594912#post594912

It should be a simple thing to insert some monitoring code and run a few tests to see the easiest way to go with your application. Later you can add more comprehensive monitoring code if this app has any long term value.
More...

Are you talking about profiling? One should always profile their code. Here is a nice and inexpensive profiler for EXEs (with or without embedded debugging information): http://www.lw-tech.com/

#40 Oct 14, 2004

Share

(You must log in or sign up to reply here.)

Search