Log in or Sign up

ET News & Sponsor Info

General Topics

Markets

Technical Topics

Brokerage Firms

Company Specific

Interactive Brokers

Tools of the Trade

Trading for a Living

Community Lounge

Site Support

Feedback

Databento launches real-time and historical APIs for US equity options at $1/month

Discussion in 'Announcements' started by Databento, Aug 30, 2023.

arbs-r-us
- 510
  Posts
- 15
  Likes
Databento said:
We had a few reasons for this design.

We ruled out mixing UDP/TCP or building a reliable protocol over UDP like QUIC/Aeron on v0 of our feed because we have hedge fund and prop firm clients who don't want to install a third-party library or binary, so we needed to keep the protocol open-source and so simple that you could literally test it with telnet.

It's a myth that UDP and conflation always achieves lower latency. TCP is more efficient in environments with limited bandwidth, network congestion, and a large proportion of small messages—because it allows for buffering of data messages to fill a full network segment.

And buffering doesn't mean it's late. We have users getting max latency of 2 ms from exchange handoff to client read over an entire day, which is nearly 10x faster than the median latency of other institutional normalized feeds like Bloomberg B-PIPE. For the same reason, TCP is used on the primary feeds of some venues even for low latency trading, e.g. Cboe FX.

Our dedicated interconnects are 1 Gbps and should be more than enough.

Binary OPRA is based on Pillar so it has two sides for A/B arbitration, i.e. the throughput on a single side gives you a good sense of the size of the raw feed if delivered over TCP; normalization cuts this down.

We pass through the license fees, which varies with user and starts at $1.25/month for non-professional users. So if you only need a small amount of data, like all of SPX+VIX option chains, that's a total of $(1.25+23.65)=$24.90/month.

If you need a lot of symbols, we recommend our flat-rate subscription which is $4,000 per month.
More...

Don't see how this is possible. Let's say you're REALLY good and get an individual quote down to 15 bytes. at 20m/second, that's almost 3gbit. Just for one side.... Is there a "blanket subscription" method whereby one can say "give me EVERYTHING"?

AND I'm probably radically overoptimistic on your compression. Probably well over 15 bytes/quote. I haven't looked at the tech documents though... Perhaps you can shortcut the "average bytes per quote msg" assumption of 15... Past that the math is trivial. The true leader in compression is Nanex. 4.5bytes/quote average if I recall correctly. Amazing technology under the hood there.

#11 Sep 7, 2023

Share
Databento Sponsor
- 161
  Posts
- 217
  Likes
arbs-r-us said:
Don't see how this is possible. Let's say you're REALLY good and get an individual quote down to 15 bytes. at 20m/second, that's almost 3gbit.

AND I'm probably radically overoptimistic on your compression. Probably well over 15 bytes/quote. I haven't looked at the tech documents though... Perhaps you can shortcut the "average bytes per quote msg" assumption of 15... Past that the math is trivial.
More...

There's several levels of misinformation here.

Peak rate is not average rate.

OPRA, like all multicast feeds, are bursty and have an average rate much lower than peak rate.

For an idea, the 50/75/100th percentile 10 minute slices of our feed yesterday were 0 GB, 197.8 GB (440 Mbps average), and 490 GB (1.08 Gbps average) respectively. You don't need compression to receive this over a dedicated interconnect and TCP.

We currently only normalize the data, strictly speaking we don't compress it.

arbs-r-us said:
Just for one side....
More...

One side is the correct measurement here because it's about the size of the feed after packet arbitration. If you count both sides, you're double-counting the throughput required for the feed.

arbs-r-us said:
Is there a "blanket subscription" method whereby one can say "give me EVERYTHING"?
More...

Yes, we do have a "blanket subscription method". You simply pass it `symbols='ALL_SYMBOLS'` like this example.

arbs-r-us said:
The true leader in compression is Nanex. 4.5bytes/quote average if I recall correctly. Amazing technology under the hood there.
More...

If they're working well for you, I think that's great.
Tessa Hollinger, Director of Product Strategy

Databento
Pay as you go for market data. Real-time and historical data direct from colocation facilities. Python, C++ and raw TCP.
$100 free usage credit for new accounts at databento.com.

#12 Sep 8, 2023

Share
arbs-r-us
- 510
  Posts
- 15
  Likes
What is the typical byte count per quote?? I can go from there.. OPRA, today, runs 10-12m/second rates for EXTENDED periods of time. "bursty" goes WELL above that as you are aware. I'm trying to determine bandwidth requirements based on the size of an individual NORMALIZED quote record. With or without the NBBO appendage. Another aspect to this is hashing. It's quite valuable to have a once-established hash or index provided by the quote consumer which would then be provided back to the consumer with all subsequent quote or trade messaging otherwise each individual quote symbol needs to be hashed..

Last edited: Sep 9, 2023

#13 Sep 9, 2023

Share
Databento Sponsor
- 161
  Posts
- 217
  Likes
arbs-r-us said:
What is the typical byte count per quote?? I can go from there.. OPRA, today, runs 10-12m/second rates for EXTENDED periods of time. "bursty" goes WELL above that as you are aware. I'm trying to determine bandwidth requirements based on the size of an individual NORMALIZED quote record. With or without the NBBO appendage.
More...

Our message sizes are bimodal around 56 and 80 bytes without compression. On the historical API they average about 11 bytes after compression. The specs are all open-sourced here.

I don't think you can compare them apples-to-apples to Nanex's if that's what you're trying to do, though. To begin, these messages don't map 1:1 to OPRA or Nanex messages. There are also four major differences affecting size as far as I can tell:

Our messages include four nanosecond-resolution timestamps per event (record.rs:48,98,101,1420), whereas Nanex's appears to be lossy and only includes only one millisecond/microsecond resolution timestamp per event—compare to OPRA spec 3.05.8.

Our feed will exhibit significantly lower end-to-end latency. This is not only because the normalization is faster, but also because of certain hardware acceleration (FPGA offload, ToE, DPDK) and our network route diversity. In fact our feed is faster on our Python client over internet than the next fastest normalized internet feed we know (not Nanex) on their C++ client.

Our normalization has to support MBO ("L3"), whereas Nanex's appears to support L1 and L2 only.

Nanex's currently has more message types and fields for exchange codes and trade condition identifiers, because they support CTA/UTP/OPRA. We have a smaller API surface because we are more opinionated on the normalization and only cover what we think is required for a full order book simulation.

This is not to say that one's better or worse. They've simply optimized their solution for compression whereas our solution is focused on lower latency and more granular information. The use cases are different and comparing them this way is somewhat pointless.

arbs-r-us said:
Another aspect to this is hashing. It's quite valuable to have a once-established hash or index provided by the quote consumer which would then be provided back to the consumer with all subsequent quote or trade messaging otherwise each individual quote symbol needs to be hashed..
More...

I'm not sure I follow your question but I'll attempt to answer this. Every instrument on our feed has a unique identifier (record.rs:43) which can serve as an index if you want O(1) lookup. You could just & (2^32 -1) over it, which will only cost a few CPU cycles.

Our API lets you to optionally specify individual symbols or option roots and filter out for you on server-side if you prefer that instead of the entire feed.
Tessa Hollinger, Director of Product Strategy

Databento
Pay as you go for market data. Real-time and historical data direct from colocation facilities. Python, C++ and raw TCP.
$100 free usage credit for new accounts at databento.com.

#14 Sep 9, 2023

Share
arbs-r-us
- 510
  Posts
- 15
  Likes
Databento said:
Our message sizes are bimodal around 56 and 80 bytes without compression. On the historical API they average about 11 bytes after compression. The specs are all open-sourced here.

I don't think you can compare them apples-to-apples to Nanex's if that's what you're trying to do, though. To begin, these messages don't map 1:1 to OPRA or Nanex messages. There are also four major differences affecting size as far as I can tell:

Our messages include four nanosecond-resolution timestamps per event (record.rs:48,98,101,1420), whereas Nanex's appears to be lossy and only includes only one millisecond/microsecond resolution timestamp per event—compare to OPRA spec 3.05.8.

Our feed will exhibit significantly lower end-to-end latency. This is not only because the normalization is faster, but also because of certain hardware acceleration (FPGA offload, ToE, DPDK) and our network route diversity. In fact our feed is faster on our Python client over internet than the next fastest normalized internet feed we know (not Nanex) on their C++ client.

Our normalization has to support MBO ("L3"), whereas Nanex's appears to support L1 and L2 only.

Nanex's currently has more message types and fields for exchange codes and trade condition identifiers, because they support CTA/UTP/OPRA. We have a smaller API surface because we are more opinionated on the normalization and only cover what we think is required for a full order book simulation.

This is not to say that one's better or worse. They've simply optimized their solution for compression whereas our solution is focused on lower latency and more granular information. The use cases are different and comparing them this way is somewhat pointless.

I'm not sure I follow your question but I'll attempt to answer this. Every instrument on our feed has a unique identifier (record.rs:43) which can serve as an index if you want O(1) lookup. You could just & (2^32 -1) over it, which will only cost a few CPU cycles.

Our API lets you to optionally specify individual symbols or option roots and filter out for you on server-side if you prefer that instead of the entire feed.
More...

Thank you for the detailed update. I'm not interested in the published NBBO or your L3/MBO. I want the individual exchange quotes. All symbols. You're correct on Nanex time resolution. 1ms is the best they do I believe. It's not "lossy" beyond that from what I recall when I used it. Historical is of no interest to me. So if we go back to your 56-80 bytes/message, at 10m/second (this is a REAL number), you're somewhere between 5-8gbit/second. Not an issue at all if co-located, but absolutely not going to happen over the internet. I believe Nanex is delivering full OPRA in under 1gbit. "record.rs" would get the job done on that topic.
#15 Sep 11, 2023

Share
Databento Sponsor
- 161
  Posts
- 217
  Likes
arbs-r-us said:
I'm not interested in the published NBBO or your L3/MBO. I want the individual exchange quotes. All symbols.
More...

To clarify, OPRA doesn't have MBO, just individual exchange quotes. However our normalization format has to accommodate MBO from US equities and other venues in one unified format, so this results in an order ID field which pads/bloats the message. I mentioned this because it goes to show my point about compression: we could technically have a dedicated schema to reduce message sizes, but it isn't the main thing we're optimizing for.

arbs-r-us said:
So if we go back to your 56-80 bytes/message, at 10m/second (this is a REAL number), you're somewhere between 5-8gbit/second.
More...

No, it tops out around 1 Gbps at the moment, not 5-8 Gbps. See earlier post:

The 50/75/100th percentile 10 minute slices of our feed yesterday were 0 GB, 197.8 GB (440 Mbps average), and 490 GB (1.08 Gbps average) respectively. You don't need compression to receive this over a dedicated interconnect and TCP.
More...

arbs-r-us said:
Not an issue at all if co-located, but absolutely not going to happen over the internet.
More...

FWIW most of our users are based on public cloud where they do have 1+ Gbps internet bandwidth, but at the cost point of our full feed ($4,000/month), our dedicated interconnect solution is a more reliable for users who want a wider pipe without spending on colocation.

Tessa Hollinger, Director of Product Strategy

Databento
Pay as you go for market data. Real-time and historical data direct from colocation facilities. Python, C++ and raw TCP.
$100 free usage credit for new accounts at databento.com.

#16 Sep 11, 2023

Share
longandshort
- 1,894
  Posts
- 1,656
  Likes
what's your pro fee rate? Also, do you have an excel plug in.

#17 Sep 11, 2023

Share
Databento Sponsor
- 161
  Posts
- 217
  Likes
longandshort said:
what's your pro fee rate? Also, do you have an excel plug in.
More...

Our usage-based fees are the same to pros and non-pros. However, our professional users need to submit paperwork to OPRA and it's expected that most of them will pay OPRA around $600 per month for licensing (see OPRA's fee schedule here).

We don't currently have an Excel plugin but we do support direct CSV downloads and CSV exports, both of which can be read by Excel thereafter.

Tessa Hollinger, Director of Product Strategy

Databento
Pay as you go for market data. Real-time and historical data direct from colocation facilities. Python, C++ and raw TCP.
$100 free usage credit for new accounts at databento.com.

#18 Sep 13, 2023

Share
arbs-r-us
- 510
  Posts
- 15
  Likes
Databento said:
To clarify, OPRA doesn't have MBO, just individual exchange quotes. However our normalization format has to accommodate MBO from US equities and other venues in one unified format, so this results in an order ID field which pads/bloats the message. I mentioned this because it goes to show my point about compression: we could technically have a dedicated schema to reduce message sizes, but it isn't the main thing we're optimizing for.

No, it tops out around 1 Gbps at the moment, not 5-8 Gbps. See earlier post:

FWIW most of our users are based on public cloud where they do have 1+ Gbps internet bandwidth, but at the cost point of our full feed ($4,000/month), our dedicated interconnect solution is a more reliable for users who want a wider pipe without spending on colocation.
More...

OPRA absolutely has NBBO appendages. All SIP feeds do. Not sure exactly what MBO is (your term). Not going to argue the bandwidth any further but your math does NOT add up. Quote traffic is REGULARLY 10m/second+ for periods exceeding 5 seconds. That puts the bandwidth requirement above 5gbit without a doubt (unless you're conflating or dropping traffic).. Let's just let this die, no reply needed. Thanks!

#19 Sep 20, 2023

Share
Databento Sponsor
- 161
  Posts
- 217
  Likes
arbs-r-us said:
OPRA absolutely has NBBO appendages. All SIP feeds do. Not sure exactly what MBO is (your term). Not going to argue the bandwidth any further but your math does NOT add up. Quote traffic is REGULARLY 10m/second+ for periods exceeding 5 seconds. That puts the bandwidth requirement above 5gbit without a doubt (unless you're conflating or dropping traffic).. Let's just let this die, no reply needed. Thanks!
More...

MBO is not NBBO, and it's a common industry term: see CME, Exegy, Quanthouse, OnixS.

And while you keep conjuring numbers out of thin air to claim that the bandwidth requirement is 5 Gbps, it just isn't, and you haven't provided any concrete evidence for this blanket claim. I already shared exact numbers earlier that you've repeatedly ignored in your subsequent follow-ups, which gives me the feeling you're not engaging in good faith:

For an idea, the 50/75/100th percentile 10 minute slices of our feed yesterday were 0 GB, 197.8 GB (440 Mbps average), and 490 GB (1.08 Gbps average) respectively. You don't need compression to receive this over a dedicated interconnect and TCP.
More...

These numbers are verifiable from our historical API since we use the same format for both historical and real-time. We also include sequence numbers to prove that there's no conflation. There's no "math" because this is just the empirical distribution of file sizes.

Tessa Hollinger, Director of Product Strategy

Databento
Pay as you go for market data. Real-time and historical data direct from colocation facilities. Python, C++ and raw TCP.
$100 free usage credit for new accounts at databento.com.

#20 Sep 20, 2023

Share

arbs-r-us likes this.

(You must log in or sign up to reply here.)

Search