Hello everybody The short question: Where can one acquire historical information on option chains? I am not seeking historical data (tick, 1 min, etc.) -- I am seeking a historical calendar of available options. Essentially, I want a view of the different types of contracts that existed over time. Once again, I'm not seeking any pricing information, just wanting to know what was available & when it was available to trade. As an example, the data I have in mind might take on a structure like this: https://i.imgur.com/dj2gHiX.png The long question: I have purchased 15min SPY quotes from CBOE (2005 - present). Part of cleaning the data is identifying missing quote gaps. For the following examples, assume that Contract xyz was/should-have-been traded from '2018-08-07 09:30:00' to '2018-08-13 16:15:00' I am categorizing these gaps as 3 different types: 1) Front Gaps - Contract xyz was not first quoted until 2018-08-10 2) Mid Gaps - Contract xyz is missing quotes at from '2018-08-11 10:30:00' to '2018-08-11 13:30:00' 3) Back Gaps - Contract xyz was last quoted at '2018-08-12 12:45:00' It is easy to identify #2 Mid Gaps (just check if there are missing times in between first quote and last quote). It is also easy to identify #3 Back Gaps (just check if last quote != expiration). However, I am finding it difficult to identify #1 Front Gaps, which is why I find myself attempting to enumerate all the different types of contracts and when they were available. Thanks for your thoughts Cheers, Sam
Maybe the attached file will help. My data only goes back to 2017-07-25, but it can be accomplished with a SQL query like this on your CBOE data. Code: select min(quote_date) as min_date, max(quote_date) as max_date, concat_ws('-',underlying_symbol,root,expiration,option_type,strike) as contract from cboe_options_history_eod where underlying_symbol = 'SPY' group by concat_ws('-',underlying_symbol,root,expiration,option_type,strike) having min_date > '2017-07-25'
Thanks for the reply, ensemble That is more or less what I am doing now. The problem, like mentioned in the long question (above) is that I am cleaning the data to identify missing quotes, i.e. where MIN(quote_date) is historically inaccurate. So while the query you cited shows me what the data has as MIN(quote_date) -- the issue I am working to correct is identifying MIN(quote_date) that is erroneous, e.g. data has MIN(quote_date) as 2017-11-29 but should have been quoted on 2017-11-24. Take as an example the following: https://i.imgur.com/NSz1tcJ.png I know for a fact that there should have been quotes for this contract on 2017-11-24, but the contract is not first quoted until 2017-11-29; I am working to identify all of these cases from 2005 to present.
At first I thought `min(open_interest) > 0` could identify those, but I looked up that contract in my DB and the OI on 2017-11-29 was zero. So you may need to join the CBOE data with another source like IVolatility to validate it.
It's surprising to me that there is no resource that provides the type of calendar summary I am seeking. I think I'm going to go old-school and use the CBOE data I already have to match it up against the pdf calendars. Slightly painstaking and tedious but should get the job done. Just out of curiosity... can you confirm that 2017-11-29 also the first day you see that contract quoted?
Okay thanks. I will try and update this thread with some data whenever I get a solid handle on the full 2005-present calendar.
For anyone else wondering, the historical listings data is available from CBOE at http://www.cboe.com/trading-resources/new-listings-series/daily-new-series-expire-memos Attached is a csv with all of the urls for easily downloading the files