Databento launches real-time and historical APIs for US equity options at $1/month

Discussion in 'Announcements' started by Databento, Aug 30, 2023.

  1. Don't see how this is possible. Let's say you're REALLY good and get an individual quote down to 15 bytes. at 20m/second, that's almost 3gbit. Just for one side.... Is there a "blanket subscription" method whereby one can say "give me EVERYTHING"?

    AND I'm probably radically overoptimistic on your compression. Probably well over 15 bytes/quote. I haven't looked at the tech documents though... Perhaps you can shortcut the "average bytes per quote msg" assumption of 15... Past that the math is trivial. The true leader in compression is Nanex. 4.5bytes/quote average if I recall correctly. Amazing technology under the hood there.
     
    #11     Sep 7, 2023
  2. Databento

    Databento Sponsor

    There's several levels of misinformation here.
    • Peak rate is not average rate.
      • OPRA, like all multicast feeds, are bursty and have an average rate much lower than peak rate.
      • For an idea, the 50/75/100th percentile 10 minute slices of our feed yesterday were 0 GB, 197.8 GB (440 Mbps average), and 490 GB (1.08 Gbps average) respectively. You don't need compression to receive this over a dedicated interconnect and TCP.
    • We currently only normalize the data, strictly speaking we don't compress it.
    One side is the correct measurement here because it's about the size of the feed after packet arbitration. If you count both sides, you're double-counting the throughput required for the feed.

    Yes, we do have a "blanket subscription method". You simply pass it `symbols='ALL_SYMBOLS'` like this example.

    If they're working well for you, I think that's great.
     
    #12     Sep 8, 2023
  3. What is the typical byte count per quote?? I can go from there.. OPRA, today, runs 10-12m/second rates for EXTENDED periods of time. "bursty" goes WELL above that as you are aware. I'm trying to determine bandwidth requirements based on the size of an individual NORMALIZED quote record. With or without the NBBO appendage. Another aspect to this is hashing. It's quite valuable to have a once-established hash or index provided by the quote consumer which would then be provided back to the consumer with all subsequent quote or trade messaging otherwise each individual quote symbol needs to be hashed..
     
    Last edited: Sep 9, 2023
    #13     Sep 9, 2023
  4. Databento

    Databento Sponsor

    Our message sizes are bimodal around 56 and 80 bytes without compression. On the historical API they average about 11 bytes after compression. The specs are all open-sourced here.

    I don't think you can compare them apples-to-apples to Nanex's if that's what you're trying to do, though. To begin, these messages don't map 1:1 to OPRA or Nanex messages. There are also four major differences affecting size as far as I can tell:
    • Our messages include four nanosecond-resolution timestamps per event (record.rs:48,98,101,1420), whereas Nanex's appears to be lossy and only includes only one millisecond/microsecond resolution timestamp per event—compare to OPRA spec 3.05.8.
    • Our feed will exhibit significantly lower end-to-end latency. This is not only because the normalization is faster, but also because of certain hardware acceleration (FPGA offload, ToE, DPDK) and our network route diversity. In fact our feed is faster on our Python client over internet than the next fastest normalized internet feed we know (not Nanex) on their C++ client.
    • Our normalization has to support MBO ("L3"), whereas Nanex's appears to support L1 and L2 only.
    • Nanex's currently has more message types and fields for exchange codes and trade condition identifiers, because they support CTA/UTP/OPRA. We have a smaller API surface because we are more opinionated on the normalization and only cover what we think is required for a full order book simulation.
    This is not to say that one's better or worse. They've simply optimized their solution for compression whereas our solution is focused on lower latency and more granular information. The use cases are different and comparing them this way is somewhat pointless.

    I'm not sure I follow your question but I'll attempt to answer this. Every instrument on our feed has a unique identifier (record.rs:43) which can serve as an index if you want O(1) lookup. You could just & (2^32 -1) over it, which will only cost a few CPU cycles.

    Our API lets you to optionally specify individual symbols or option roots and filter out for you on server-side if you prefer that instead of the entire feed.
     
    #14     Sep 9, 2023
  5. Thank you for the detailed update. I'm not interested in the published NBBO or your L3/MBO. I want the individual exchange quotes. All symbols. You're correct on Nanex time resolution. 1ms is the best they do I believe. It's not "lossy" beyond that from what I recall when I used it. Historical is of no interest to me. So if we go back to your 56-80 bytes/message, at 10m/second (this is a REAL number), you're somewhere between 5-8gbit/second. Not an issue at all if co-located, but absolutely not going to happen over the internet. I believe Nanex is delivering full OPRA in under 1gbit. "record.rs" would get the job done on that topic.
     
    #15     Sep 11, 2023
  6. Databento

    Databento Sponsor

    To clarify, OPRA doesn't have MBO, just individual exchange quotes. However our normalization format has to accommodate MBO from US equities and other venues in one unified format, so this results in an order ID field which pads/bloats the message. I mentioned this because it goes to show my point about compression: we could technically have a dedicated schema to reduce message sizes, but it isn't the main thing we're optimizing for.

    No, it tops out around 1 Gbps at the moment, not 5-8 Gbps. See earlier post:

    FWIW most of our users are based on public cloud where they do have 1+ Gbps internet bandwidth, but at the cost point of our full feed ($4,000/month), our dedicated interconnect solution is a more reliable for users who want a wider pipe without spending on colocation.
     
    #16     Sep 11, 2023
  7. what's your pro fee rate? Also, do you have an excel plug in. :D
     
    #17     Sep 11, 2023
  8. Databento

    Databento Sponsor

    Our usage-based fees are the same to pros and non-pros. However, our professional users need to submit paperwork to OPRA and it's expected that most of them will pay OPRA around $600 per month for licensing (see OPRA's fee schedule here).

    We don't currently have an Excel plugin but we do support direct CSV downloads and CSV exports, both of which can be read by Excel thereafter.
     
    #18     Sep 13, 2023
  9. OPRA absolutely has NBBO appendages. All SIP feeds do. Not sure exactly what MBO is (your term). Not going to argue the bandwidth any further but your math does NOT add up. Quote traffic is REGULARLY 10m/second+ for periods exceeding 5 seconds. That puts the bandwidth requirement above 5gbit without a doubt (unless you're conflating or dropping traffic).. Let's just let this die, no reply needed. Thanks!
     
    #19     Sep 20, 2023
  10. Databento

    Databento Sponsor

    MBO is not NBBO, and it's a common industry term: see CME, Exegy, Quanthouse, OnixS.

    And while you keep conjuring numbers out of thin air to claim that the bandwidth requirement is 5 Gbps, it just isn't, and you haven't provided any concrete evidence for this blanket claim. I already shared exact numbers earlier that you've repeatedly ignored in your subsequent follow-ups, which gives me the feeling you're not engaging in good faith:

    These numbers are verifiable from our historical API since we use the same format for both historical and real-time. We also include sequence numbers to prove that there's no conflation. There's no "math" because this is just the empirical distribution of file sizes.
     
    #20     Sep 20, 2023
    arbs-r-us likes this.