If I also backtest using hourly or EOD data, do you think I should create separate tables for them, instead of using WHERE Extract(Minute_Second From Timestamp) = 6000, which should be slow
It depends on the Bloomberg data. Does their data always have an entry for nn:00 even if there was no transaction for that minute? Also, seperate tables for hourly or EOD would be significantly smaller and easier to work with. I would suggest starting small, such as with the DOW 30 stocks, build your database and queries, and when you have that working, work on the full 1000 stocks.
Use materialized views for longer time-frame bars aggregated from one min data. This would be a particularly brain-dead query to base a materialized view on. Perhaps a basic course on SQL and RDBMS at your local community college is in order.
Here below is a materialized view aggregating data from an FX five-minute bar base table into a one-hour bar materialized view. The code is Oracle SQL but a MySQL version should be similar: CREATE MATERIALIZED VIEW vFX1Hour TABLESPACE PERMTBS08 BUILD IMMEDIATE REFRESH FORCE ON DEMAND AS SELECT sdate AS SDate, symbol as Symbol, trunc(baridx/12) AS BarIdx, mod(min(nycbarts+baridx*100000),100000) AS BarTS, mod(min(openbid+baridx*100000),100000) AS OpenBid, mod(min(openask+baridx*100000),100000) AS OpenAsk, (mod(min(openbid+baridx*100000),100000)+mod(min(openask+baridx*100000),100000))/2 AS OpenMid, max(high) AS High, min(low) AS Low, mod(max(lastbid+baridx*100000),100000) AS LastBid, mod(max(lastask+baridx*100000),100000) AS LastAsk, (mod(max(lastbid+baridx*100000),100000)+mod(max(lastask+baridx*100000),100000))/2 AS LastMid, sum(ticks) AS Ticks, avg(avgbidsize) AS AvgBidSize, avg(avgasksize) AS AvgAskSize, max(maxsize) AS MaxSize, min(minsize) AS MinSize, avg(avgSpread) AS AvgSpread, max(maxspread) AS MaxSpread, min(minspread) AS MinSpread, sum(newhighs) AS NewHighs, sum(newlows) AS NewLows, sum(changecmo) AS ChangeCmO, sum(twap*ticks)/sum(ticks) AS TWAP, sum(vwap*((avgbidsize+avgasksize)/2))/sum((avgbidsize+avgasksize)/2) AS VWAP, to_number(substr(max(10000000000+high*100000||'_'||(1000000000+hightstamp)),-9)) AS HighTStamp, to_number(substr(min(10000000000+low*100000||'_'||(1000000000+lowtstamp)),-9)) AS LowTStamp FROM FX5Min WHERE FType = 'FX' GROUP BY sdate,symbol,trunc(baridx/12) ORDER BY sdate,symbol,BarIdx; Edit: I see that the indenting and formatting has been lost in the posting. If you have trouble reading it let me know and I'll pm you a zipped version.
Fundamental data(quarterly) is very much different than price and volume data(intraday or daily). Stock charting software is more suited for price and volume figures. Unless, you want to develop your own TA software, there are commercial packages that you can consider. You can then focus on the effort to link the TA and FA together.
Why MySQL? Do you need relational tables? Why not a NoSQL like Mongo (which I love). I say this because some data, say, you may have only monthly (e.g. monthly employment number), some data may have OHLC data to it (Daily, or intraday) , some may have volume, some open interest, some split prices, some corporate actions and etc, some may just be tick by tick, etc. My guess is, the only relation you need is on dates, but that is a trivial alignment in most programming languages. If I were you, I would reconsider a NoSQL solution like Mongo or Caassandra.
I have a Microsoft Access database with 4+ million quotes. It's very easy to use and its inexpensive, especially compared to SQL Server.
My issue with Access is size limit. I think it is like 2GB. MySQL is open source and free, this is where I load all my options data.