Store OHLCV data in MySql database.

Discussion in 'Data Sets and Feeds' started by mengtrader, Apr 13, 2010.

  1. Hello,

    I plan to store OHLCV data in Mysql database for backtesting purpose. In your opinion, besides the EOD data, if you need weekly data, do you suggest to calculate them on the fly or store those weeekly/monthly OHLCV data along with daily data in the same time series table?

    My intital design is like:

    timeSeries table: symbol_id,date_time,o,h,l,c, v, timeframeID
    timeFrame table: timeframeID, descripiton

    If you do store those weekly/monthly data, do you mind sharing some database design idea regarding to how to store timeseries data?

    Thanks a lot in advance!
    Andrew
     
  2. It depends - seriously. Day / Week is relatively fast (factor of 7 for the data). OTOH minute to hour would be a factor of 60.

    My own design has:

    MarketAggregate table (OHLC etc.)
    MarketAggregateType table (string codes how it is calculated, like "Hourly(1)" - allows various types

    I precalculate ;)

    Note that the string "Hourly(1)" would be mappable to the generating class - this is why there is no short parameter list. This is basically the only really generic way, given that some agrgregators take more than one parameter, others do not at all etc.
     
  3. jeffweng

    jeffweng

    I would suggest you store your weekly/monthly data separately to different tables. To me it's a more clearer design.

    You will have tables like daily_table, weekly_table and monthly_table.
    If you have too much data on the daily_table, you can create one table for each symbol, such as AAPL_DAILY, MSFT_DAILY.

    Anyway, the bottom line is, you don't wanna put too many records on one single table.

    Hope this help.
     
  4. All that is a total violation of what SQL is for. It also makes comparisons hard, SQL wise. In general, you should not do that unless you have a lot of data - that is several billions of rows. On on-crappy hardware this is double digit several billions.

    At the end, it is a BAR - how it is generated is part of the metadata, but there is no reason for multiple tables.
     
  5. jeffweng

    jeffweng

    Thanks for pointing that out. I knew it was a violation of dependency. The data I was dealing with was one-minute data of 1,500 stocks over 10+ year period. At first I stored them into one single table and did the calculating in the program, which is written in Excel VBA. After several freezing up in the Excel, I decided to divide them up into different tables and the program ran much smoother since then.

    PS: My server is kinda crappy.
     
  6. Fractal

    Fractal

    This is just a random question, but how do you deal with bad data?

    Do you identify and adjust incorrect OHLC data via SQL? Or is your data direct from the exchange and always 100% accurate?
     
  7. Correcting via SQL is normally not feasible uinless you precalculate manually - and even then, sorry, I prefer to fix up the underlying raw data and let the result be regenerated.

    I actually still worko n that part. The idea would be that a tick change / delete will trigger a marking of all relying bars as invalid (basedo n timestamps of the bar start and end - a good reason to have that with high accuracy). THen the buffer will regenerate these bars. FOr some (metadata) a change would mean deleting all following bars (range bars may have changes, as may point and figure type aggregates).
     
  8. Thanks a lot for the sharing!

    I do like your idea but having difficulty in understanding how it is actually implemented in MySql.

    Can you share more about the E-R model for those two tables? What do you mean by "mapping to the generating class"?

    Thanks a lot,
    Andrew
     
  9. Fractal

    Fractal

    Interesting. Thank you for sharing your thoughts.
     
  10. LeeD

    LeeD

    Something along the lines of linked tables and stored procedures?
     
    #10     Apr 13, 2010