Data inconsistency

Discussion in 'Data Sets and Feeds' started by BBSLamer, Oct 17, 2020 at 2:55 AM.

  1. BBSLamer


    Does anybody have a reasonable explanation for the frequent inconsistency of data across different providers, and advice on how to reconcile the differences? Here is just one example (of many) from a high profile stock.

    For this case, I will specifically be looking at float, shares outstanding, and market cap.

    Let's take Snowflake (SNOW)

    Close: 242
    Float: 28M
    Outstanding: 36M
    Market Cap: 66.7B

    Close: 242
    Float: 36M
    Outstanding: 279M
    Market Cap: 67.7B

    Close: 242
    Float: 7M
    Outstanding: 278M
    Market Cap: 67.3B

    To summarize, we have three leading sites offering three different accounts of what the float is. Shares outstanding vary slightly among two of the sites and drastically among the third. Market Cap all reveal approximate 67b, although not exactly, as one would suspect this number to be precise and easily calculated assuming you have the correct number of shares outstanding. What's especially noteworthy is that marketwatch correctly lists the market cap at 67b but the shares outstanding at 36b, which is impossible, leading us to believe that they don't calculate the market cap on their own.

    How does one reconcile these differences, especially when consuming large amounts of data and aggregating results? Is this the "dark side" of getting free data? Even Wall Street Journal, the pinnacle of the mainstream financial world, seems it cannot be trusted since 36M outstanding is impossible if the market cap truly is 66b (I must note it seems they pull their data from marketwatch).

  2. jharmon


    It's simple: Don't use free sources of data. Ever. "but?" Still here? Stop.

    Question your paid source until you are either:
    a) convinced they are useless
    b) you are proven wrong

    PS - Tried SNOW for stock data. It was garbage. This company will fail.
  3. BBSLamer


    Which company will fail? SNOW?

    Thank you for the response but it doesn't really address my question, perhaps I wasn't clear enough. Why is the data faulty in the first place? These are relatively static metrics that don't change much from month to month, if at all. It's not like I'm asking why tick data is inconsistent across providers. Simple stats like shares outstanding, float, market cap.. this is not data that needs to paid for IMO.

    I realize there are a lot of data vendors here on ET that are trying to push their warez, but the reality is, most [basic] data really need not be paid for (either it's generally available for free, or it's already provided by one of your brokerage accounts, assuming you have the necessary skills to parse it out). Assuming you trust paid data simply because of it's price tag, are we supposed to trust simply because they charge a fee, over the long established

    One way to reconcile the issue is to perhaps pull data from multiple sources and flag the data that does not match with itself, for further investigation. The downside is that there's manual work involved, of course, but it's something.
  4. jharmon


    Yes, you said these are simple stats. They are reported by the company. However, such collation of data is time consuming and very much a manual process. The concept of "float" differs from vendor-to-vendor. Others may only update information on a less frequent basis too.

    A paid source at least will let you know when the data is updated and (if queried) why. Good luck getting a free web site or a brokerage service to tell you anything.
  5. ph1l


    A securities reference data system I used to do development on had automated rules to choose particular data when there were conflicts from vendors. When the system detected bad data, it would create a work item for an operations group (~20 people!) to potentially manually fix the data.

    The data was expensive, and my employer would pay for significant development to drop vendors and find alternatives when the vendors raised prices too much and/or if the data quality/timeliness was too bad.
    jharmon likes this.
  6. jharmon


    Whilst the metrics don't change much, the sampling frequency is the issue.

    Some companies interpret every single filing/announcement. Other companies may only look at quarterlies. Others may only look at annuals. Some might ignore intra-quarter events such as stock splits, convertible note conversions, employee share issues, private/public capital raising events etc. Some might be lagged by a few days too. Some might prioritize S&P 500/1500/Rusell 1000/3000 companies ahead of others.

    Index providers typically monitor things but only make changes when there are significant events. So, a real event (say 4% additional shares offered in a private capital raising) might not be reflected until the next quarterly review.

    Last I heard, Reuters had a team of hundreds of Indian nationals doing this.

    Any metric you can't verify yourself is prone to error. Thanks @ph1l for sharing your experiences too.