cleaning up missing timestamps in historical data

Discussion in 'Data Sets and Feeds' started by Elitist Trader, Jan 28, 2011.

  1. So I have two data sets, I want to find out running correlation between the two.
    In each data set there are some minutes where the contract didn't trade and when it doesn't trade there is no time-stamp, so when im trying to find running correlation some minutes arent there in one contract and so I need to add a 'fake' minute so the two dont get out of sync. any suggestions?

    I'm doing all this in excel, maybe theres a better program I should use?
     
  2. LeeD

    LeeD

    You can fill missing minutes with interpolation... but this will skew the results.

    If missing minutes are few, a better solution is to exclude all data points where one of the data points is missing. Assuming you are doing correlation on returns and not actual prices, the next step is to adjust returns where the return is over a longer than one-minute period. For 2-minute periods devide return by square root of 2; for 3-minute periods by square root of 3 etc.

    If lots of data points are missing in an irregular way, you need to use more robust estimates.

    Also note, if you use 10-minute or 1-hour data instead of 1-minute, the correlations will likely be quite different. So, think for yourslef regarding what frequency of sampling is most relevant.

    Out of inexpensive or free statistical tools, R is probably the best.
     
  3. Is there anywhere I could get the data with the missing minutes already filled in?

    Regarding filling missing minutes, i would be most interested in filling missing minutes with the closing price of the previous bar
     
  4. bone

    bone

    I know that when I order data from CQG's data factory, I ask for it 'cleaned'. For the record, I am getting it in ASCII format.

    There are also some data mining suites that you can write a simple macro to make the necessary adjustments. Some trading oriented analysis data suites as I recall have a canned feature with a time/date/year setting where you specify the timeframes you want cleaned.

    Bloomberg and Reuters time and sales data is generally pretty good as is.
     
  5. LeeD

    LeeD

    Missing minutes usually mean that there were no trades during a particular minute. So, in a way it's not "missing", it's just a way to show correctly what the market action was.

    Do a simpe test. For each missing minute calculate the difference between previos minute "Close" and following minute "Open".

    If on 99% of missing bars this difference is within 1 tick, "filling" missing minutes this way won't add too much error to the result. If the difference is often larger, you really need more robust a method.
     
  6. bone

    bone

    LeeD is correcto-mundo. + 1
     
  7. LeeD

    LeeD

  8. rosy2

    rosy2

    can you use bid/ask data instead of last trade
     
  9. LeeD

    LeeD

    You can, and it is a better way to do it on a thinly traded instrument.

    However, you need good tick data for that. Not only it's more difficult to get hold of, a few data providers only provide best bid/best ask snapshot together with a trade tick, not for every change in best bid/best ask. Using such data for filling missing minutes, for example, defeats the purpose.
     
  10. rosy2

    rosy2

    #10     Feb 3, 2011