Log in or Sign up

ET News & Sponsor Info

General Topics

Markets

Technical Topics

Brokerage Firms

Company Specific

Interactive Brokers

Tools of the Trade

Trading for a Living

Community Lounge

Site Support

Feedback

Why use a database?

Discussion in 'Data Sets and Feeds' started by onelot, Oct 9, 2004.

Sparohok
- 1,124
  Posts
- 2
  Likes
Hey Prophet,

I think we're in violent agreement here. I'm not against optimization, I just think that someone just starting out with their implementation shouldn't worry about optimization right off the bat. As you say, your 90/10 rule is expressing much the same idea, so I don't think we're that far apart.

To get back to the original question of the thread, that's why I think it's better to start with a database which does an enormous amount of work for you, providing atomicity, a data model, a client-server model, etc., but as you point out not the best performance. It's a lot easier to take a good database and move the crucial bits into flat files. For example in my system all the data is in a database but it is cached in flat files. If the database gets too huge I might need to migrate all the data into flat files permanently and just keep metadata in the database. But no matter where that ends up, the database will continue to provide all kinds of advantages, for example I do data capture on a Windows box and analysis on Linux. My life is a lot easier because Postgres takes care of the networking, locking, transactions, notifications, and the like, which would be a lot of work if I had to develop my own client server model.

It's just kinda amusing for me to see the heavy focus on fancy hardware and algorithms here when you can accomplish great things with simple algorithms, obsolete hardware, and a (God forbid) CRT monitor instead of flat panel! I think that pushing the limits of performance will often obscure the underlying issues that affect profitability. One example is statistical validity, I think the more data you have and the more you analyze it, the easier it is to over fit your data and end up with serious implementation shortfall as a result. I've seen this again and again, and in such cases throwing more hardware or data at the problem will make it worse.

Martin

#71 Oct 15, 2004

Share
nononsense
- 4,928
  Posts
- 5
  Likes
Quote from Sparohok:

Hey Prophet,

I think we're in violent agreement here. I'm not against optimization, I just think that someone just starting out with their implementation shouldn't worry about optimization right off the bat. As you say, your 90/10 rule is expressing much the same idea, so I don't think we're that far apart.

To get back to the original question of the thread, that's why I think it's better to start with a database which does an enormous amount of work for you, providing atomicity, a data model, a client-server model, etc., but as you point out not the best performance. It's a lot easier to take a good database and move the crucial bits into flat files. For example in my system all the data is in a database but it is cached in flat files. If the database gets too huge I might need to migrate all the data into flat files permanently and just keep metadata in the database. But no matter where that ends up, the database will continue to provide all kinds of advantages, for example I do data capture on a Windows box and analysis on Linux. My life is a lot easier because Postgres takes care of the networking, locking, transactions, notifications, and the like, which would be a lot of work if I had to develop my own client server model.

It's just kinda amusing for me to see the heavy focus on fancy hardware and algorithms here when you can accomplish great things with simple algorithms, obsolete hardware, and a (God forbid) CRT monitor instead of flat panel! I think that pushing the limits of performance will often obscure the underlying issues that affect profitability. One example is statistical validity, I think the more data you have and the more you analyze it, the easier it is to over fit your data and end up with serious implementation shortfall as a result. I've seen this again and again, and in such cases throwing more hardware or data at the problem will make it worse.

Martin
More...

Martin,

Don't give the store away.

nononsense

#72 Oct 15, 2004

Share
harrytrader
- 5,658
  Posts
- 3
  Likes
So it's amusing that people lose their time tweaking and reinventing the wheel without achieving the same result

Quote from harrytrader:

lastest financial db tech allows billions of records analysed in seconds.
More...

#73 Oct 15, 2004

Share
lastick
- 84
  Posts
- 0
  Likes
Quote from harrytrader:

lastest financial db tech allows billions of records analysed in seconds.
More...

Silly assertion.

#74 Oct 15, 2004

Share
linuxtrader Guest
- 255
  Posts
- 0
  Likes
Quote from Sparohok:

....
It's just kinda amusing for me to see the heavy focus on fancy hardware and algorithms here when you can accomplish great things with simple algorithms, obsolete hardware, and a (God forbid) CRT monitor instead of flat panel! I think that pushing the limits of performance will often obscure the underlying issues that affect profitability.

Martin
More...

Right on the money: The things that make money in the markets and elsewhere are based on an idea, not a plethora of available computing power. ... And yes to the dismay of some of the other posters, it is possible to construct an algorithm that converges rapidly to an answer on a large dataset in ten or tens of minutes on commodity hardware.

The great unspoken secret in the technology business is that there is much much more raw computing power available at low prices right now than anyone needs. True, I could make use of it all with some gigantic simulation but the things that are lacking right now and that will be valuable in the future are insights and not just another excuse to write code to soak up processor cycles.

#75 Oct 15, 2004

Share
linuxtrader Guest
- 255
  Posts
- 0
  Likes
Quote from nononsense:

Martin,

Don't give the store away.

nononsense
More...

He is not ... and neither is the other debater.

The whole argument is well known.

You can produce highly optimized code that will run fairly complicated optimizations in relatively small amounts of time: the trick is to match the technique to the problem and understanding how to optimize to your hardware- like the other poster said about matching the cache and pipelines with the data/instructions.

You can make the analysis as complicated as you want ... Just remember that large funds and banks have a different agenda for doing time consuming calculations that often has little to do with the efficacy or applicability of the approach. There is a whole ecosystem built around time-consuming (and often obfuscating) computational approaches and the result often has little positive effect on the bottom line.

#76 Oct 15, 2004

Share
linuxtrader Guest
- 255
  Posts
- 0
  Likes
Quote from marist89:

That's funny, I can get 850K records out of my database of 500M records in less then a second. Of course, I run Oracle.
More...

Similar things with mySQL databases on Linux... AND my SQL is free while oracle is not......

As far as answering a laundry list of inquiries about database configuration and query execution my advice to the other poster that asked is to read their oracle documentaion: all of those issue are discussed if they need confirmation of how to handle implementation and optimization - which you graciously answered .....

#77 Oct 15, 2004

Share
billgates
- 193
  Posts
- 0
  Likes
Quote from linuxtrader:

Similar things with mySQL databases on Linux... AND my SQL is free while oracle is not......

As far as answering a laundry list of inquiries about database configuration and query execution my advice to the other poster that asked is to read their oracle documentaion: all of those issue are discussed if they need confirmation of how to handle implementation and optimization - which you graciously answered .....
More...

Sorry people its hard to believe in these numbers. Raw hard disk read speed on my PC is about 10MB/sec. 850K records/sec, at 20 bytes/record is 19MB/sec. And in case of a database you should access several index files, then read the data from the table files non-consequentually ...

#78 Oct 15, 2004

Share
linuxtrader Guest
- 255
  Posts
- 0
  Likes
Quote from billgates:

Sorry people its hard to believe in these numbers. Raw hard disk read speed on my PC is about 10MB/sec. 850K records/sec, at 20 bytes/record is 19MB/sec. And in case of a database you should access several index files, then read the data from the table files non-consequentually ...
More...

Take a look at the basic industry benchmarks then .. TPC etc. The numbers for much more expensive operations than a covered select run around 750K - 1M operations per minute on commodity hardware. A properly optimized select on a properly configured db server - optimized for reporting queries - does much better than these ......

and fyi the free mysql database can beat or closely match $racle and $soft and $bm database servers on some types of operations ....

#79 Oct 15, 2004

Share
billgates
- 193
  Posts
- 0
  Likes
Quote from linuxtrader:

Take a look at the basic industry benchmarks then .. TPC etc. The numbers for much more expensive operations than a covered select run around 750K - 1M operations per minute on commodity hardware. A properly optimized select on a properly configured db server - optimized for reporting queries - does much better than these ......

and fyi the free mysql database can beat or closely match $racle and $soft and $bm database servers on some types of operations ....
More...

Are you talking about select or selct+fetch ?

#80 Oct 15, 2004

Share

(You must log in or sign up to reply here.)

Search