You don't have tick data. You have 1 minute data of 3000 stocks. Tick data is different than 1 minute data. Take each of the 3000 symbols and create one database per symbol or table per symbol. Call each database/table something like CSCO1Min, DELL1Min, etc. Then if you want to create 15 minute databases you can create CSCO15Min, DELL15Min, ,etc. Divide and conquer the problem space into smaller components istead of dealing with the whole crapload of data.
Or course. setup A_m table (mindata of A) B_m table (mindata of B) getTuple("A", "B", from, to) <- stored proc getTuple can run the query and return the resultset. It will scan two tables only as opposed to many hundreds of millions of rows you have. on the minute data table, I have NO keys. they are MyISAM tables, and I ensure they are ordered. This is subject to experimentation and change
Sometimes it isn't about dazzling everyone with hardware bullshit and going hardcore like some people do. Its about really thinking long and hard about the problem at hand and THEN spending money if you need it on the hardware to speed up your solution. You may just find that building a quote database and running queries on it shows you very little in your vested time or is good enough. No use making it worse by spending thousands of dollars on hardware just because you were so excited about the project. Just use Ubuntu 64 and MySql Cluster 7. Ubuntu 64 bit to address large physical memory and Cluster so that you can keep the tables in-core memory and fully use the 64 bit address space reducing IO. Make sense?
Absolutely. (Plus, I'm not comfortable spending money on something I don't fully understand. I figure an unnecessary hardware purchase is just as bad as losing money on the market.)
A correction. MyISAM tables (with minute data) does have a key on (date, time). Also, I just now tested with InnoDB tables. For typical queries (from datetime to datetime), MyISAM is faster, so I will stick with that for now.
A suggestion before you get too carried away in this... You have to know what you are looking for, and most of the time, it won't be there.
Sent PM; but the general answer is that I put a DLL in a retail broker platform with good historical data and funneled the data out.
Garchbrooks, I have been reading your thread with some interest. I used to know everything about databases. Iâm a retired Data Base Administrator (DB2 and Oracle). But now I trade full time to forget restoring DBs at 2:00 A.M. every other day. Iâm not a SQL Server guru. Iâm a mainframe DBA. But I will help if you want it. I see that you are trying to do the âimpossibleâ of using a PC (they still donât match the bigger hardware) and run a monster app against big databases. Like you found out you are going to be tied up with I/O trying to access all the data in raw form. Unless you have an absolute need for raw data then do what they do in real apps and condense it into something that is usable. The first question I always get is why canât a PC do what the mainframe does. So I will answer it in advance. On a mainframe you would spread your data across multiple disk packs. Next the data would be partitioned across the packs. When you ran your query each pack would grab part of the data. Imagine 20 packs working parallel on the I/O. That is a quick over view of what you canât do on a P.C. Iâm very familiar with processing this type of data.