OrderBook 'sweep' event-depth update nuances needed

jelite · Jul 6, 2013

I am interested in learning some nuances about how orderbook (DOM) updates along with time and sales updates might be affected
during 'fast conditions'.
I don't have a precise definition of a 'fast' condition but think of a market taking out 3
consecutive price levels within a matter of a millisecond (all trades at those prices are reported by the exchange with
the same millisecond timestamp). This is an almost everyday occurrence in, say, crude oil trading.

Let's look at an example (again, CL) to be able to state some of my questions.

Let's assume (big assumption-but don't question it for now) I have all sequential trade/DOM updates as they come from CME.
Order book snapshot looks as follows:
95.87, 50
95.86, 33
95.85, 60
95.84, 34
95.83, 26
95.82, 5
7, 95.81
39, 95.80
.
.
.

Now I see, in this order, a sequence of trades (long list) that add up to the following:
5 contracts at 95.82,
26 at 95.83,
34 at 95.84,
80 at 95.85 (not 60!),
40 at 95.86 (not 33),
17 at 95.87
After this sequence of trade updates I get a Level2 update that looks as follows
.
.
.
95.87, 33
1, 95.86
.
.
.
All of this transpires within a fraction of a second. So 5 ask levels were 'swept' (95.82-95.86) and one was partially executed (95.87).
After that trading continues at a slower pace were orderbook and level1 updates 'keep up' with trades occurring.

1. How come there was more volume executed at 95.85 and 95.86 than showed in order book 'just' before this event?

Let's assume no icebergs were sitting there. Then clearly, this would point to me not getting some order book updates
while this was happening as someone provided additional liquidity at those levels that was not reflected in order book
updates during that time. Even if there were icebergs sitting there, I would likely see some 'intermediate quotes' at
those levels, reflecting the exhaustion/replenishment of size at those levels.

I can find so many of these situations where during fast market conditions I see a mismatch of posted size versus
executed size (including situations were executed size < posted-i.e., pulling of orders), that I can only make two conclusions:

a) my data is always incomplete during that time (not likely-as here there is no issue of latency here-the only important
criterion is to get all updates sequentially even with a lag)
b) I don't fully understand the algorithm by which CME disseminates their order book updates, especially in such 'fast'
conditions

Does anyone have a good explanation for question '1.' above? Knowing more about b) would be great too...
I am hoping hft,NetTecture and others with hft knowledge would be able to answer this.
I can provide concrete examples with times down to a millisecond for anyone willing to discuss this with me (feel free to PM).

rwk · Jul 6, 2013

I have been seeing this too, and I was assuming I am not seeing everything the matching engine has.

@jelite: You didn't tell us what data feed you're using. My broker aggregates the data, but I understand Globex is also aggregating data before distributing it. I am guessing they have non-aggregated data for a price, but it might require co-location to use it effectively.

That leaves me wondering whether there is any point in analyzing book and tape with retail data and software.

@jelite: Let us know (or PM me), if you learn anything.

jelite · Jul 6, 2013

I use IQFeed data - via their api, so it's not broker data. They claim not to filter the data as they get them directly from the exchange. Their Level1 data are of the form (roughly):
last trade price, last trade quantity, bid price, bid quantity, ask price, ask quantity

Bid/ask prices and quantities are ones 'at the time of trade'.

While market is sweeping the book (fast move), as in the situation I described, you would see something like this in the trade list from the feed (all these would have the same millisecond timestamp):
95.82, 2, 95.81, 7, 95.82, 5
95.82, 1, 95.81, 7, 95.82, 5
95.82, 2, 95.81, 7, 95.82, 5
95.83, 1, 95.81, 7, 95.82, 5
95.83, 3, 95.81, 7, 95.82, 5
....
95.83, 1, 95.81, 7, 95.82, 5
95.84, 2, 95.81, 7, 95.82, 5
...
...
95.87, 3, 95.81, 7, 95.82, 5
...
95.87, 1, 95.86, 1, 95.87, 33

So, in that situation, 'bid/ask/quantities' are 'frozen' until at some point the data catches up. I presented this situation (with precise trade list) to IQFeed people and they confirmed that this is how the exchange (CME) sends it to them. So it seems at least on the 'trade' feed (Level1 data), the exchange itself doesn't report every bid/ask change in these fast conditions. Some color on this would be great too from people who know the details.

It seems to me that in such conditions Level2 feed is also 'frozen' but I don't have a confirmation of this.

Such details are hard to come by for outsiders... like me but are very valuable in assessing where liquidity enters/leaves the market...

rwk · Jul 6, 2013

I question whether the data as it comes from Globex has both trades and inside quotes in the same transaction. Do you know this for certain?

I'm using IB's data. For quotes, I get a separate size update for each side, one per price level (0 through 9, with 0 being best). Most quotes merely reflect a new order, a change, or a cancellation. But I get 20 transactions every time there is an uptick or downtick. To save bandwidth, I am subscribing to only 5 levels, but I am also monitoring two markets at a time.

For trades, I get price, size, time, cumulative volume, VWAP, and a flag indicating whether the update is aggregated. Quotes and trades get mixed together into a serial stream. I seem to be getting somewhere around 250 updates per second. It took some fairly complex programming to reconstruct the limit order book, but the tape was pretty trivial.

Some people say that aggregated data is crap. I'm not so sure. I figure what I am missing in data aggregation I make up for in being more current. I'm open to changing my mind about this.

In your example, I don't see a lot of inconsistency. It looks like there was a pretty large marketable order to buy at 95.87 that swept all the offers from 95.82 to 95.87. I wouldn't see any reason for the exchange to update the order book until the entire buy transaction is transmitted. I would think a marketable order would be filled entirely from the order book (i.e. from resting orders), so there is no way a new order could participate.

jelite · Jul 6, 2013

Quote from rwk:

I question whether the data as it comes from Globex has both trades and inside quotes in the same transaction. Do you know this for certain?

I'm using IB's data. For quotes, I get a separate size update for each side, one per price level (0 through 9, with 0 being best). Most quotes merely reflect a new order, a change, or a cancellation. But I get 20 transactions every time there is an uptick or downtick. To save bandwidth, I am subscribing to only 5 levels, but I am also monitoring two markets at a time.

For trades, I get price, size, time, cumulative volume, VWAP, and a flag indicating whether the update is aggregated. Quotes and trades get mixed together into a serial stream. I seem to be getting somewhere around 250 updates per second. It took some fairly complex programming to reconstruct the limit order book, but the tape was pretty trivial.

Some people say that aggregated data is crap. I'm not so sure. I figure what I am missing in data aggregation I make up for in being more current. I'm open to changing my mind about this.

In your example, I don't see a lot of inconsistency. It looks like there was a pretty large marketable order to buy at 95.87 that swept all the offers from 95.82 to 95.87. I wouldn't see any reason for the exchange to update the order book until the entire buy transaction is transmitted. I would think a marketable order would be filled entirely from the order book (i.e. from resting orders), so there is no way a new order could participate.
More...

As far as your first question, I don't know it for certain-would like to know for certain. But this is what I was told by IQFeed people. Specifically, they told me that CME has two separate feeds-one for L1 data (trades and inside quotes) and another one for book updates. someone else.

As far as IB or any other aggregated feed-really you are missing a lot there. Consider that some (many) very important events play out in a millisecond. If you happen to 'miss it' due to feed aggregation, you miss a lot on what the order flow contains.

In my example, what interests me is how the additional transacted volume at 95.85 (20 contracts more than posted) and 95.86 can be explained by some logic of how CME order book updates work.
Note, even if a huge market order swept the book there would still have to be someone adding 20 contracts at 95.85 'just at that time' which should be reflected in depth of book data. Or maybe, as you suggest-but I don't know this and would like to- in such situations all updates are 'frozen' until that order completes. These are some of the things that bug me.

rwk · Jul 6, 2013

This is how I envision a crossing engine working. I'm not speaking with certainty here, only theorizing.

An order can only be an entirely new order (buy or sell), a change order, or a cancellation. As it takes an order from the input queue, the exchange checks to see whether the order is marketable. If the order is marketable, it performs the cross, and prints the trade to the tape. It then updates the book. Every transaction affects the book regardless of whether there is a cross. If an order is not marketable, it gets booked.

In your example, a big buy order sweeps several price levels. The tape will show multiple prints, hopefully at least one per price level, but better yet would be one per offer. I see no reason for the exchange to re-send the order book until all those prints are sent. I also don't see any problems with aggregating two or more of those prints as long as they are at the same price.

The processing of input transactions all happens in microseconds (or faster). Since no traders can respond in time to participate, there is no point in re-transmitting the updated order book until the input queue is empty. That would just waste bandwidth. Once the input queue is empty, the exchange can set about re-sending the revised order book.

If this is how the crossing engine actually works, it explains how additional supply can sneak into the middle of the price ladder. It comes out of the input queue. I suspect the revised order book only gets transmitted after the crossing engine has worked off any backlog.

jb514 · Jul 7, 2013

The proper term is conflation. The data is conflated meaning they don't send a message for every single change, sometimes multiple changes happening at two points in time are sent as the same message.

ofthomas · Jul 7, 2013

to derive any benefit from the information you seek to explore you will need FOB data, which will contain every single update to the resting orders... sweeping the book means little if you can not understand how quickly the orders behind were replenished... keep in mind all this happens in the LL/ULL space and as such to take advantage of it you will be competing with those HFT in that space... so you would be entering the "time/space wars"...

anyhow, some analysis can be done with CQG... but realistically you need CEP to analyze the FOB feeds live, notice the event and then act accordingly... there is also ...sceeto.com, which can help you pinpoint the events on more generally available platforms like NT7/MC/SC/TS .... a lot of the work has been done for you within sceeto.com, you just need to determine what actions you want to take depending on how you interpret the events...

gmst · Jul 7, 2013

I hear NYSE TAQ data is the most detailed covering every single damn change. Of course it won't be useful for you directly since you are talking about CME/GLOBEX products. However, no harm in understanding NYSE TAQ dataset to better understand how exchanges in general send their most detailed feed.

Does anyone know of a CME/GLOBEX feed that corresponds in detail to the NYSE TAQ data? Thanks.

rwk · Jul 7, 2013

Quote from jelite:
In my example, what interests me is how the additional transacted volume at 95.85 (20 contracts more than posted) and 95.86 can be explained by some logic of how CME order book updates work.
Note, even if a huge market order swept the book there would still have to be someone adding 20 contracts at 95.85 'just at that time' which should be reflected in depth of book data. Or maybe, as you suggest-but I don't know this and would like to- in such situations all updates are 'frozen' until that order completes. These are some of the things that bug me.
More...

I was reading through some old related threads on ET, and that triggered a realization. I think what I overlooked is that the hidden size is probably coming from stops being elected. There could be new orders arriving, but more likely it is from stops. When a stop is elected, in most cases it becomes a marketable limit order immediately (i.e. with no change in the timestamp).