is there minimum charge? For example, I only need to stream real-time totalview datapoints every 15minutes for the ohlc bar, that is less than a few mb per day. What would be the minimum charge for that including all fees charged by exchange?
Seems like the easiest solution for all parties. For some unsolicited advice, I would really stress transparency in licensing and explaining how archaic vender agreements lead to the pricing model. I have done, what seem likes like hours on hours, of research into data streaming and feel I've only scratched the surface on the different nuances in various systems. I feel I would've been misled by "pay-per-gb" only to be hit by a $1,500 licensing fee the second I consume a stream had I no prior knowledge, which it sounds like this group of people may be part of your targeted customer base. I checked out the website after getting off the waitlist and it is super sleek, props to your UI team . I haven't tested any of the datasets yet, however, since your MBO data seems to have some kind of canonical model rather than original ITCH data -- is that right? It is at least not my understanding of NASDAQ ITCH format based off what I saw from the sample data. I cannot find any docs describing this format on the site either.
What do you mean consuming a stream? An average user using restful to get data of whole mkt level 1 each second consumes a stream?
I might have been too loose with my wording. Databento says "you only commit $1,500 per month" when you "activate the live data". This seems fairly straightforward if we're talking about the UI alone -- you have to go click the button to enable the stream, agree to the licensing fee passed forward to your account, and then whatever mechanism in code to enable the stream (consuming it). How it works without the UI, however, seems fairly complicated... If you are an "average user using restful to get data of whole mkt level 1 each second" only working in code, how Databento will notify you that you owe a licensing fee is an interesting question. A non-display, non-professional fee on NASDAQ might run you only $1 per month, but Cboe will be much more expensive >$1,000. Most likely you will be required to always "activate" the live data via their UI therefore agreeing to the licensing fee, then you will be able to "consume" the feed in code. Just a guess, we'll have to wait for Databento's reply.
Because of the way our live protocol is designed, all of its users have real-time access. So even if you only choose to wait T+15 minutes to request for the data, you'll still be categorized a real-time user. As for how the real-time fees apply thereafter, it depends. For Nasdaq TotalView-ITCH at the time of this writing: - A non-professional pays $15/month regardless of display or non-display use. - A professional pays $76/month for display use. - A professional pays $375/month for non-display use. - A firm may pay around $1,500 for internal redistribution, or simply use us as a vendor of record to report multiple professional subscribers at $76 or $375/month each. - There are a bunch of other possible fees. Our service charges that apply on top of that will be similar to that for historical data, which currently starts at $0.45/GB. But this portion should be negligible compared to the license fee because OHLC bars take up very little bandwidth.
Thanks for the advice. Yes we can commiserate and indeed our UI will make it very obvious what the licensing fees are; we've actually been working directly with our exchange sales reps for the design. Happy to hear that you like the UI. There are still many known issues and bugs before we're ready for a public launch, so please bear with it if you encounter any and feel free to contact myself, use the live chat support, or reach us via our Slack community for assistance. The data is normalized. The easiest way to understand the normalization format and various schemas is to go to the sample data section in the dataset detail page. The common fields are also documented under Schemas and conventions in our docs. The same normalization format is used across all asset classes and venues and between real-time vs historical, so you can use the same code to parse/process our data from CME, NYSE etc., for both real-time and historical.
The way we think of it is as follows: The live license for the data needs to be activated on the UI before you can start streaming it via our live APIs (raw TCP or WebSocket). Activation mainly involves electronic execution and submission of the license agreement(s) to the venue. Otherwise your subscription API call will just return with an authorization error that tells you that you haven't activated your license. There will be three ways you can get to the license activation flow: - If you see the portal nav now, there's a disabled section that says "Licensing". One can get any dataset's live access activated there. - If you go to the dataset detail page on our data catalog, there will be a license section and button letting you start the activation flow there. - The API documentation will also point to you there.
Veering a bit off-topic here but I thought it might be informative to share our thoughts on REST: Our live APIs will be streaming APIs - you establish a session, send a subscription message, and listen for messages. So it won't be idiomatic to keep polling on them every 1 second; it's more efficient to subscribe just once for OHLCV-1s and listen. None of our public APIs are RESTful. Our historical API is RPC-like and non-RESTful though it uses the same protocol (HTTP) as most REST APIs. (2) is an intentional design decision: we don't think that market data is well-expressed in a RESTful model. REST at its core is about manipulating entities. For example, a use case that's mostly CRUD on elementary resources, like a dashboard backend API for adding users, deleting them etc., is very good fit for REST. However, once you start doing things involving multiple resources (e.g. datasets, schemas, symbols) at once across multiple domains (e.g. resolve symbols, fetch this range and merge these symbols and then submit this batch job), your REST endpoints become either a word salad (unnecessarily complicated) or too rigid (inexpressive and unintentful). Market data is a good example. It feels awkward to force things like schemas/formats, symbols, dates to become resource-oriented, but neither do they really behave as simple sorts and filters that get treated as query parameters under RESTful convention. Our team has worked with many internal APIs at top trading firms and there's a good reason we've rarely seen RESTful APIs in use.
Was GraphQL part of the original API design debate? I could see pricing calculations becoming overly complex with GraphQL, but it gives all the power into the client's hands.
Yes, we considered GraphQL but felt for market data that it's actually less expressive than the kind of proprietary "streaming query" frameworks that many of the top prop firms have and that we envision adding later on. We might migrate to GraphQL for the semi-private API used by our portal SPA though.