best way to hack up an rdbms to approximate something like kdb for poor/dumb people

FutsTrader111 · Mar 24, 2010

Quote from garchbrooks:

I've got one minute tick data for around 3000 stocks. The total set size for me is 200 million rows.
More...

You don't have tick data. You have 1 minute data of 3000 stocks. Tick data is different than 1 minute data.

Take each of the 3000 symbols and create one database per symbol or table per symbol. Call each database/table something like CSCO1Min, DELL1Min, etc.

Then if you want to create 15 minute databases you can create CSCO15Min, DELL15Min, ,etc.

Divide and conquer the problem space into smaller components istead of dealing with the whole crapload of data.

garchbrooks · Mar 24, 2010

Quote from FutsTrader111:

You don't have tick data. You have 1 minute data of 3000 stocks. Tick data is different than 1 minute data.

Take each of the 3000 symbols and create one database per symbol or table per symbol. Call each database/table something like CSCO1Min, DELL1Min, etc.

Then if you want to create 15 minute databases you can create CSCO15Min, DELL15Min, ,etc.

Divide and conquer the problem space into smaller components istead of dealing with the whole crapload of data.
More...

Will do. Thanks Futs and Darth.

Your help has been invaluable.

DarthSidious · Mar 24, 2010

Quote from garchbrooks:
Code:
Using prices from stock A and prices from Stock B:

For a given date range:
   For a given set of times on each date:
      stocktuple = assemble tuple (price A from stock A, price from stock B)
      tuplelist.append( stocktuple )
More...
Or course. setup
A_m table (mindata of A)
B_m table (mindata of B)
getTuple("A", "B", from, to) <- stored proc

getTuple can run the query and return the resultset. It will scan two tables only as opposed to many hundreds of millions of rows you have.

on the minute data table, I have NO keys. they are MyISAM tables, and I ensure they are ordered. This is subject to experimentation and change

FutsTrader111 · Mar 24, 2010

Sometimes it isn't about dazzling everyone with hardware bullshit and going hardcore like some people do.

Its about really thinking long and hard about the problem at hand and THEN spending money if you need it on the hardware to speed up your solution.

You may just find that building a quote database and running queries on it shows you very little in your vested time or is good enough. No use making it worse by spending thousands of dollars on hardware just because you were so excited about the project.

Just use Ubuntu 64 and MySql Cluster 7. Ubuntu 64 bit to address large physical memory and Cluster so that you can keep the tables in-core memory and fully use the 64 bit address space reducing IO.

Make sense?

garchbrooks · Mar 24, 2010

Quote from FutsTrader111:

Sometimes it isn't about dazzling everyone with hardware bullshit and going hardcore like some people do.

Its about really thinking long and hard about the problem at hand and THEN spending money if you need it on the hardware to speed up your solution.

You may just find that building a quote database and running queries on it shows you very little in your vested time or is good enough. No use making it worse by spending thousands of dollars on hardware just because you were so excited about the project.

Just use Ubuntu 64 and MySql Cluster 7. Ubuntu 64 bit to address large physical memory and Cluster so that you can keep the tables in-core memory and fully use the 64 bit address space reducing IO.

Make sense?
More...

Absolutely. (Plus, I'm not comfortable spending money on something I don't fully understand. I figure an unnecessary hardware purchase is just as bad as losing money on the market.)

DarthSidious · Mar 24, 2010

A correction. MyISAM tables (with minute data) does have a key on (date, time). Also, I just now tested with InnoDB tables. For typical queries (from datetime to datetime), MyISAM is faster, so I will stick with that for now.

FutsTrader111 · Mar 24, 2010

Quote from garchbrooks:

Absolutely. (Plus, I'm not comfortable spending money on something I don't fully understand. I figure an unnecessary hardware purchase is just as bad as losing money on the market.)
More...

A suggestion before you get too carried away in this...
You have to know what you are looking for, and most of the time, it won't be there.

DarthSidious · Mar 24, 2010

garch, where are you getting minute data on stocks / what is a good place to get it from? Thanks

garchbrooks · Mar 24, 2010

Quote from DarthSidious:

garch, where are you getting minute data on stocks / what is a good place to get it from? Thanks
More...

Sent PM; but the general answer is that I put a DLL in a retail broker platform with good historical data and funneled the data out.

Rabbitone · Mar 24, 2010

Garchbrooks,

I have been reading your thread with some interest. I used to know everything about databases. Iâm a retired Data Base Administrator (DB2 and Oracle). But now I trade full time to forget restoring DBs at 2:00 A.M. every other day. Iâm not a SQL Server guru. Iâm a mainframe DBA. But I will help if you want it.

I see that you are trying to do the âimpossibleâ of using a PC (they still donât match the bigger hardware) and run a monster app against big databases. Like you found out you are going to be tied up with I/O trying to access all the data in raw form. Unless you have an absolute need for raw data then do what they do in real apps and condense it into something that is usable.

The first question I always get is why canât a PC do what the mainframe does. So I will answer it in advance. On a mainframe you would spread your data across multiple disk packs. Next the data would be partitioned across the packs. When you ran your query each pack would grab part of the data. Imagine 20 packs working parallel on the I/O. That is a quick over view of what you canât do on a P.C.

Iâm very familiar with processing this type of data.