Historical Options Data for statistical analysis

Discussion in 'Options' started by cdcaveman, Jul 19, 2012.

  1. where do you get historical options data? from expired contracts and their related price action volume volatility etc.. I have an IB account it doesn't come with it does it?
  2. Expensive and volumous.

    Nanex nxCore provides historic opra data:
    Daily tape files are several GB's. typically 5gb to 15gb compressed per day.
    Couple of hundred bucks per month for data and significantly more for infrastructure to process and manage the data.

    5 years of OPRA data, 500K symbols consolidated and ripped into 25ms/ 1 sec, 1 min, daily bars exceeds 50TB.

  3. sounds like pocket change.. haha i'm a dork... so basically if you were doing some analyic scan of history across that many symbols it would make sense to take the fifty Tb's but what about the stocks that you follow?
  4. hft_boy


    That's it? I thought it was like several G's a month. Do you know about daily data, then? I guess one datafeed is as good as another for that ...
  5. derTrader


    Any chance to get historical options EOD data cheaper? :confused:
  6. TskTsk


    IB has historical data for options, not sure how far back it goes though.

    Also, if you have historical UL data you can price breakeven vol and thus breakeven vol surface from realized vol on UL, not sure if that's useful for you though.
  7. just21


    www.eoddata.com have historic US option prices included in the $29.95 subscription.
    Cmoss likes this.
  8. EOD Data is free or low cost... ie. eoddata.com, yahoo etc.
    Problem is many options trade thinly and last prints can be days old.
    You also need ohlc and an adjusted close for splits/dividends etc.

    Most services are based on last print which yields substantially different results than a tick accurate extraction based on bid/ask quotes. Makes a significant difference for intraday analysis but may not be relevant for eod type analysis.

    We build out and track ohlc on bid/ask and VWAP on trades between quote ticks. Found this to be a much more reliable data scheme as it accurately reflects "executable" market conditions. No reliable 3rd party for this type of data so we build it in house.

    I pulled just the EOD equities data from 2008 to present.
    The DB is just over 12 million records, 24K distinct symbols and 2GB in size.
    The corresponding splits/divs/symbol changes are relatively small but are vital for historic analysis.

    Easily manageable in matlab and excel either loading symbol sets or running queries against the indexed tables either on disk or loaded in memory.

    Example may be something like a dispersion algo analyzing SPY against its composites. 501 symbols x 5 years of EOD results in an excel workbook of 550,000 rows x 8 columns. Near instantaneous result sets using Autofilters in excel... 80Mb workbook. High Resolution Data Repository >>> Extract Focus Instrument Sets EOD Data >>> Load into Analysis Engine. This style of processing optimizes well for speed, performance and reduces the working data size.

  9. It might be worth taking a look at ivolatility.com or livevol.com. They both have eod data at reasonable prices and you can specify what data you'd like.

    Afaik, IB's historical data shouldn't go back further than 1yr and likely less than that.
  10. A single ticker (and strike) of options data is actually quite small but the problem is when you take a single ticker over five years and add in all the strikes and expiration months. One symbol can quickly turn into a small equity feed in terms of size.

    For reference, when I started recommending database size for NxCore equity data I assumed that ~10-12TB would be overkill. With one client that I host we are now expanding to 36TB in total and that's only a 6-month "band aid". (the data is compressed 20:1 and they are extracting many variations)

    I have a firm that has been kicking tires for three months over the database size and cost for an options OPRA feed. Considering we went from 10-12TB to 36TB for equities and that the options feed is larger I'd say 50-100TB would be a decent start for options data and that as well would be a "band aid" for a short term solution. It's a full time job to maintain 50-100 TB's of data.

    Depends on what you need though. Tick by tick with detail is always going to be massive compared to EOD or basic bars. Just adding my $0.02 and comment so that storage is also taken into account. Paying for the data is only half the equation.
    #10     Jul 20, 2012