High resolution US stock data providers?

Discussion in 'Data Sets and Feeds' started by BlackPhoenix, Dec 8, 2023.

  1. Sticking to the stocks only for now since stock's only thing I can trade atm. Got my hands full getting my trading platform working even with stocks ;)
     
    #11     Dec 14, 2023
  2. Looks like I'll go with polygon.io since it's pretty much the only data provider with aggregate 5sec intraday bars I could find from a long list of data providers. It doesn't seem to have an option to filter out pre-/post-market data though, which would be nice to reduce the download, so now I just have to through out half the downloaded data instead. Unlike some providers they are also transparent with pricing and I think $29/mo is pretty good for 5 years of historical stock data.
     
    #12     Dec 16, 2023
  3. You can actually specify the time interval with Unix millisecond timestamps for bars to download if you download the data one day at the time, which you need to do anyway for 5sec bars. The download seems to be around 60x faster than using IBKR, which is very nice and downloading Russell-1000 for 5y should take ~5 days.
     
    #13     Dec 18, 2023
  4. Some more info about polygon.io. Their compliance emailed me to verify I'm eligible for their "individual" plan because I used an old email address of my ceased-for-years company asking what's this company and my association with it.. which was kind of interesting that simply using an email address triggered them to contact me and they even dug out the name of the company ;) This was cleared, but just as a caution what email address you use with them to avoid extra hurdle, and perhaps just use gmail address or something :D

    They also seem to be now throttling my access down to about half what it used to be in the beginning. I used to be able to download 5y of 5sec bars in ~300 seconds and now it takes 700+ seconds for the same data. This is pretty consistent so it's not because of the load on their servers.
     
    #14     Dec 21, 2023
  5. Polygon.io

    Polygon.io Sponsor

    Hey, I'm happy to hear that you're finding value out of the second-level aggregates. To comment on a few things:
    - We plan to introduce a parameter to filter out extended hours data, so hopefully that will resolve your need to query unnecessary data.
    - We have to run compliance checks frequently to ensure no one has "misclassified" themselves. The exchanges/regulators are very strict about pro/non-pro, and are eager to apply heavy fees on us for anyone that may have misclassified
    - We are not throttling your access, you may have been impacted by a degraded performance issue which increased latencies: https://polygonstatus.com/

    Should you have any questions or concerns, please do not hesitate to reach out!
     
    #15     Dec 22, 2023
  6. Yeah, I think the download speed is now back to normal so maybe increased latencies was the issue.

    If you fetch the data one day at the time (like I do, because I download 5sec aggregate bars), you can get around the lack of RTH parameter with the Unix millisecond timestamps that's the alternative way to query the data besides the dates. Calculating the timestamp itself is pretty straightforward, but there's a complication because these timestamps are in UTC timezone, so you need to adjust them to the exchange timezone AND take daylight saving time (e.g. ETD/EST for NYSE/Nasdaq) into account which is a bit of an extra hurdle.

    Also I reported an issue that the volume data occasionally comes in exponent representation (e.g. "2.086556e+06" for 2086556, which is a bit silly) if the value is large enough that can mess up the JSON parsing. You don't necessarily see this issue if you use some JSON parsing plugin as the plugin may do the conversion automatically before viewing it to user as it did with your helpdesk person, but that's how the raw data comes in.

    There's also some opportunities to reduce the download size overall and improve the download speed. Currently the data seems to come in as gzipped JSON format, but I further compress the data locally to ~30% of this with fairly straightforward custom compression method. If the data storage and internet bandwidth are major items in your bottom line (which I believe they are since it's your whole business), then this could be an opportunity to optimize and further improve customer experience :)

    And also shout-out to your support, who has been very friendly and helpful that I have had a needed for couple of times so far :)
     
    Last edited: Dec 25, 2023
    #16     Dec 25, 2023
  7. @Polygon.io Looks like this impaired download performance of your servers is a recurring issue while your status pages are all green. E.g. earlier today it took 326 seconds to download 5y of stock data, while now few hours later (1:30am UTC) exactly the same data took 1820 seconds, i.e. almost 6x longer. My internet downloads are 117Mbps so the issue definitely isn't in this end. Either you are really throttling the downloads or there is some other severe bottleneck in your servers.
     
    #17     Dec 28, 2023
  8. Download speeds seem to be back to normal again. Looks like there was ~4h window around 1:30am UTC where the download speeds were abysmal. Not sure how recurring this issue is. There weren't any unusual volume of requests to your servers at that time either (it was around 18k/sec while it's now 20k/sec and downloads are fine).
     
    #18     Dec 29, 2023
  9. Graph showing the peak in download speeds:
    upload_2023-12-29_15-25-37.png
     
    #19     Dec 29, 2023