Log in or Sign up

ET News & Sponsor Info

General Topics

Markets

Technical Topics

Brokerage Firms

Company Specific

Interactive Brokers

Tools of the Trade

Trading for a Living

Community Lounge

Site Support

Feedback

Databento - Direct market data feeds for everyone, built by former HFT traders

Discussion in 'Announcements' started by Databento, Jul 12, 2022.

Databento Sponsor
- 161
  Posts
- 217
  Likes
2rosy said:
can an end user redistribute this data raw? or run scanners on it and distribute that?
any metrics?
More...

Yes. The reason being that we see our main value-add as the infrastructure/technology layer, not as a data licensor.

To be specific, we don't impose any licensing restrictions on our user agreements. For almost every market, once any of the data gets older than 24h, there's nothing on the market's end that restricts a user from redistribution or redistributing derived data on that. I can only think of 1 market operator (CME) that has a restriction on historical redistribution. If your current vendor does impose any restriction on historical redistribution, there's a high chance that those restrictions are coming from the vendor rather than the market.

As for redistribution on live data (24h or newer), that's where markets would generally require you to pay for a redistribution license. We simply facilitate the process for a user to attain the redistribution license from the market operator and pass through those fees. If a user has a redistribution license from the market, we don't impose any limitations on them redistributing the data. It's quite cost-prohibitive though, often in excess of $5k MRC per feed for external redistribution. If the purpose is merely internal redistribution, we recommend most of our users to use us as a vendor-of-record and break down the usage by subscriber count.

Tessa Hollinger, Director of Product Strategy

Databento
Pay as you go for market data. Real-time and historical data direct from colocation facilities. Python, C++ and raw TCP.
$100 free usage credit for new accounts at databento.com.

#21 Jul 14, 2022

Share

shuraver likes this.
Databento Sponsor
- 161
  Posts
- 217
  Likes
2rosy said:
any metrics?
More...

Do you mean metrics data? Or metrics about the service itself (latency, amount of data etc.)?

On exchanges that do provide static data like EOD volume, open interest etc., we do pass those on. On top of that we compute daily liquidity metrics (e.g. event counts, percentiles of touch depth, average spread etc.) like this and provide that.

Tessa Hollinger, Director of Product Strategy

Databento
Pay as you go for market data. Real-time and historical data direct from colocation facilities. Python, C++ and raw TCP.
$100 free usage credit for new accounts at databento.com.

#22 Jul 14, 2022

Share

shuraver likes this.
Databento Sponsor
- 161
  Posts
- 217
  Likes
jtrader33 said:
What kind of timeframe are you anticipating?
More...

Haven't yet finalized a launch date publicly but it's likely to be between Aug 23 to Sep 13.

It's quite hard to time our announcements and I understand it causes some frustration for some folks (@M.W.) and we hear you. Keep in mind it's mainly because of the sheer scale of moving pieces. (1) We plan to do a mass-regeneration of a large chunk of data (about 6+ PB) to make sure everything's clean, and that just takes a long time and has a lot of variance because of the amount of data involved. (2) We do have a fairly large waitlist and don't want the service to be underprovisioned for users we've already onboarded. All of our storage and IP transit is self-hosted, so it's a little trickier to scale everything than if we were on a cloud platform.

Tessa Hollinger, Director of Product Strategy

Databento
Pay as you go for market data. Real-time and historical data direct from colocation facilities. Python, C++ and raw TCP.
$100 free usage credit for new accounts at databento.com.

#23 Jul 14, 2022

Share

jtrader33 likes this.
2rosy
- 3,186
  Posts
- 1,361
  Likes
Databento said:
Do you mean metrics data? Or metrics about the service itself (latency, amount of data etc.)?

On exchanges that do provide static data like EOD volume, open interest etc., we do pass those on. On top of that we compute daily liquidity metrics (e.g. event counts, percentiles of touch depth, average spread etc.) like this and provide that.
More...

I mean latency metrics on the service. Also, does databento subscribe to any competitors feeds to compare metrics. For the realtime feed, does the sdk handle reconnects, missed messages, etc. will udp be offered in future. how large are the messages and what bandwidth is recommended?

#24 Jul 14, 2022

Share
Databento Sponsor
- 161
  Posts
- 217
  Likes
2rosy said:
I mean latency metrics on the service.
More...

Here's a diagram breakdown of our latency. It's currently about 41~ mics through the stack till our load balancer and firewall. This diagram is a little outdated and shows <2 mills for the load balancer and firewall, but the min/mean/max through that is now 0.7/63.7/310 us after an upgrade we made 2 weeks ago.

Most of the latency outside our stack will be dominated by distance and your network provider. You can ping/traceroute the public loopback for our Aurora I load balancer (dc3.databento.com) for example. If you're hitting our CME gateway from Aurora I/II or 350 E Cermak, it will likely be sub-mill or 1 ms one-way respectively.

If you're hitting our NY4 gateway from anywhere in the Equinix campus in Secaucus, it will likely be sub-mill one-way. (You'll probably get the best numbers in NY4 and NY2, where we terminate IP transit.)

We haven't really optimized for latency since it's still targeted at internet users initially. We'll probably push for about 5 us (2 switch hops, 2 PCIe hops, and some time in userland on host CPU) when we open up to colo users as well.

2rosy said:
does databento subscribe to any competitors feeds to compare metrics.
More...

Not really, but our engineers have mostly worked with: Activ, Bloomberg B-PIPE, Celoxica, MayStreet, Redline, QuantHouse, SR Labs (now Exegy). It's not in good spirit to critique their stacks.

2rosy said:
For the realtime feed, does the sdk handle reconnects, missed messages, etc.
More...

It uses TCP as the transport protocol, so you won't miss messages unless you lose connection outright. We provide an intraday replay up till the last 24 hours that you can opt in upon subscription, so that can be used for recovery.

2rosy said:
will udp be offered in future.
More...

Unfortunately too far away for us to commit right now.

2rosy said:
how large are the messages and what bandwidth is recommended?
More...

Most of our binary-encoded messages are 28~ bytes compressed in real-time and 13~ bytes compressed for historical. The uncompressed messages are mostly 56 bytes per event - it's designed so each message can fit within a cache line.

Most equities exchanges are probably fine with 5 MB/s. But say full order book on our beefiest feed is about 4B events per day, maybe 30% of it clustered around a 1 hour period, so you'd need about 20 MB/s so as to not fall behind. Or you can just pick a subset of symbols to manage the bandwidth (we allow up a combination of any arbitrary 1,000 instruments per subscription if you decide not to listen to every symbol on the market).

We also expose CSV and JSON for convenience but those take up more bandwidth and our client libraries just use binary encoding all of the time.

Tessa Hollinger, Director of Product Strategy

Databento
Pay as you go for market data. Real-time and historical data direct from colocation facilities. Python, C++ and raw TCP.
$100 free usage credit for new accounts at databento.com.

#25 Jul 14, 2022

Share

shuraver likes this.
Robert Morse Sponsor
- 7,319
  Posts
- 6,166
  Likes
Sophia-Can we assume your live streaming data is subject to non-display fees for equities, option and futures?

Robert Morse |VP, Institutional Sales
Lightspeed Financial Services Group
Phone: 646-393-4806 | rmorse@lightspeed.com
www.lightspeed.com

#26 Jul 14, 2022

Share
Databento Sponsor
- 161
  Posts
- 217
  Likes
Robert Morse said:
Sophia-Can we assume your live streaming data is subject to non-display fees for equities, option and futures?
More...

That depends on the exact market in question.

Our users will often incur non-display fees for US equities, options and futures feeds. But what we've seen is that there's a common misunderstanding here among many retail traders (and even vendors): Several exchanges interpret non-display by whether you have a designated session port, and not by the means of data consumption. This means if you're feeding the data via API to a charting software and automating your execution through that, or you're running an autospreader, or you're consuming the data via API in your own custom application but you're not connected to the exchange on its extranet, there's a fair chance that those non-display fees do not apply.

Counterintuitively, this also means if you're running an execution gateway software that requires you to register a session port, even if you're not actually automating your execution, you might well be subject to non-display fees.

We take each of these idiosyncratic rules into account rather than adopting a uniform policy when we process our users' applications for real-time licensing.

Also, for US equities, keep in mind that we only provide prop feeds and don't deal with the UTP/CTA SIPs, so the policies will vary slightly from what you might see from most retail data vendors.

Lastly, for OTC markets like cash FX, in some cases we have a special agreement with the ECN that lets us bypass typical non-display fees.

Tessa Hollinger, Director of Product Strategy

Databento
Pay as you go for market data. Real-time and historical data direct from colocation facilities. Python, C++ and raw TCP.
$100 free usage credit for new accounts at databento.com.

#27 Jul 14, 2022

Share

shuraver and cruisecontrol like this.
cruisecontrol
- 722
  Posts
- 446
  Likes
Databento said:
Yes, we're in the midst of adding message schemas for imbalance, RPI and cross-trades as part of our recent integration of the NYSE Integrated prop feeds.

We won't be providing raw data over internet. The reason being that it's impractical to distribute raw multicast data over sparse WAN links. Our normalization, compression and proprietary binary encoding cuts down the size of the raw feed by at least a factor of 20x and is the only reason we're able to distribute full market feeds over internet.

We do normalize the data in a way that preserves some of the packet-level information and practically all of the payload-level information though, to the point that it is possible to simulate a passive market making strategy with very high fidelity.

Our distribution licenses do allow us to distribute raw data though so we plan to offer that next year when we focus on our colo offerings.
More...

Thanks for your reply.

Will it be possible to select / download ONLY certain types of messages and not pay for the other types?
In historical? in real time?

Regarding RAW data: At least for historical, RAW packets still compress very well with a big enough block size and a decent codec. If raw is not provided then it's on you to make sure every element that any user cares about is in the schema.

#28 Jul 14, 2022

Share

Databento likes this.
Robert Morse Sponsor
- 7,319
  Posts
- 6,166
  Likes
Thank you. Is it also fair to say that for live data, this is more of an Institutional offering and not for the retail market? and, the back testing data and futures live and back testing data might be more apt to be targeting both retail and Institutional clients from a cost stand point?

Databento said:
That depends on the exact market in question.

Our users will often incur non-display fees for US equities, options and futures feeds. But what we've seen is that there's a common misunderstanding here among many retail traders (and even vendors): Several exchanges interpret non-display by whether you have a designated session port, and not by the means of data consumption. This means if you're feeding the data via API to a charting software and automating your execution through that, or you're running an autospreader, or you're consuming the data via API in your own custom application but you're not connected to the exchange on its extranet, there's a fair chance that those non-display fees do not apply.

Counterintuitively, this also means if you're running an execution gateway software that requires you to register a session port, even if you're not actually automating your execution, you might well be subject to non-display fees.

We take each of these idiosyncratic rules into account rather than adopting a uniform policy when we process our users' applications for real-time licensing.

Also, for US equities, keep in mind that we only provide prop feeds and don't deal with the UTP/CTA SIPs, so the policies will vary slightly from what you might see from most retail data vendors.

Lastly, for OTC markets like cash FX, in some cases we have a special agreement with the ECN that lets us bypass typical non-display fees.
More...

Robert Morse |VP, Institutional Sales
Lightspeed Financial Services Group
Phone: 646-393-4806 | rmorse@lightspeed.com
www.lightspeed.com

#29 Jul 14, 2022

Share
Databento Sponsor
- 161
  Posts
- 217
  Likes
cruisecontrol said:
Thanks for your reply.

Will it be possible to select / download ONLY certain types of messages and not pay for the other types?
In historical? in real time?

Regarding RAW data: At least for historical, RAW packets still compress very well with a big enough block size and a decent codec. If raw is not provided then it's on you to make sure every element that any user cares about is in the schema.
More...

No problem.

Yes you can select for only a specific schema, in both historical and real-time. That's how we homogenize the solution for both institutional and retail users: a retail user can subscribe to only very few things that they need to keep costs low, whereas our institutional users often want everything.

As a side note if you're curious about the technical details: That's one of the reasons why our gateway is currently so slow (40+ us) actually... the parsing and book building is actually happening in sub-300 ns, but we're doing a lot of bookkeeping to export all of the schemas, do all of the transcoding, and filter out separate channels and streams customized for each user.

Oh I think I understand what you're suggesting. I misunderstood and thought you meant the raw multicast packets. But the raw payload or simply one half of the packets after A/B arbitration might be possible. That's a workable idea, I'll let the team know and see if it's possible. Thanks for suggesting!

Last edited: Jul 14, 2022

Tessa Hollinger, Director of Product Strategy

Databento
Pay as you go for market data. Real-time and historical data direct from colocation facilities. Python, C++ and raw TCP.
$100 free usage credit for new accounts at databento.com.

#30 Jul 14, 2022

Share

shuraver and cruisecontrol like this.

(You must log in or sign up to reply here.)

Search