Historical S&P 500 data, dealing with old tickers, changes, etc.

Discussion in 'Data Sets and Feeds' started by taotree, Feb 15, 2024.

  1. taotree

    taotree

    I want to do some analysis on daily prices of stocks in the S&P 500 over the last 10-20 years. Most datasets I have found only contain the stocks in the S&P 500 right now, but analyzing only that list is subject to survivorship bias.

    I found a dataset for the list of components in the past. Now to try to get data for it:

    At first, I tried using the Tradier API, but it returned no data for delisted stocks, of which there are a surprising number. Also, I found cases where data was missing, and a few cases of its data looking questionable, such as the open for one symbol jumping from around 34 to 0.005 and back over 3 days, another case of 2 digits to 4 digits and back, etc.

    Then I found this article about finding a data provider.

    So, I tried Tiingo's API. The data looks ok, and it appears to include delisted stocks. I very quickly run into the problem of ticker name changes and such. Searching around I think I found that ABC became COR, ABS to ACI, ABX... maybe GOLD? but I'm not sure. ACKH... I don't know. And yes, that's alphabetically and already running into that many changes and not past AC* yet.

    Is there somewhere that has info that can help figure this out? It's going to take a long time to research all these (probably 100's) one at a time.
     
    murray t turtle likes this.
  2. I just recently had similar issue when fetching historical data from Polygon.io database. They have "ticker events" API to get a list of ticker symbol changes, but it's still in development and had some issues when I tried and there's no ETA for the release. Also not sure if it can be used for delisted stocks, but you should be eble to try it for free on their website though.
     
  3. NorgateData

    NorgateData Sponsor

    @taotree This is is exactly what we offer (and the dataset you mentioned, which actually references Andreas Clenow's set, which is from us).

    The problem you are describing is a a point-in-time data set issue.

    Stocks in the S&P 500 (and other indices) vary over time as new securities are added, some are removed (for various reasons, including market cap ineligibility (at rebalance time), due to takeover, bankrtupcy, and complex corporate actions). Such securities can also also have changes in symbols, so any "point-in-time" set of symbols becomes stale and inaccurate over time.

    On the S&P 500, we've done the (extensive) research over time back to the inception of the S&P 500 (March 1957) and other indices as shown here:
    https://norgatedata.com/data-content-tables.php#ushics

    We offer a lookback-limited (2 year) trial of the data set here:
    https://norgatedata.com/freetrial.php
     
  4. %%
    I see your points t tree;
    but since you cant buy that old bench mark sounds like you are barking Up the Wrong tree.
    Easier to figure that out on[ 12+30] DOW;
    but its been an underperformer for so long + by so much , not doing that myself, but to each his or her own..........................................