Reconstruction an order book from an ITCH Feed: Am I missing something?

johnny_no_lots · Aug 10, 2012

I am attempting to reconstruct the order book for a single day (2012-07-09), using TradingPhysics' Nasdaq ITCH CSV file. The simulation is congruent with the ITCH feed -- by which, I mean, by treating my order book with each instruction, I get the same prices and order ID's executing -- until 09:30:00.419-04. (That's not less than one second of congruence; the first trade is at 07:00:03 and 91,404 messages were successfully processed before the error was encountered.)

At 09:30:00.419-04, the ITCH feed says an order that was placed immediately prior to the execution (0 seconds difference), executed for 300 sold. The order that was placed did not improve the best offer, and there was already 3100 shares sitting at the price offered by the new order -- that is, it was not first in queue. My order book instead said that if 300 were executing, it should have been off an order that was placed at 09:30:00.419-04 (yes, same time, but much higher order ID). This order set the best offer.

My limit order book is well-tested, so either 1) I have failed to implement a rule that I am unaware of or 2) the ITCH feed is wrong. In general, I always assume I'm at fault until proven otherwise, so I'm going with (1).

I thought that perhaps the subsequently placed but immediately executed order was being executed elsewhere, but I don't see why that would happen.

sma202 · Aug 10, 2012

you do realize that bid/quotes timestamps don't correlate with trades timestamps 100%? as in reporting trades can be delayed by a certain X amount of time by broker?

monstimal · Aug 10, 2012

Quote from johnny_no_lots:

I am attempting to reconstruct the order book for a single day (2012-07-09), using TradingPhysics' Nasdaq ITCH CSV file. The simulation is congruent with the ITCH feed -- by which, I mean, by treating my order book with each instruction, I get the same prices and order ID's executing -- until 09:30:00.419-04. (That's not less than one second of congruence; the first trade is at 07:00:03 and 91,404 messages were successfully processed before the error was encountered.)

At 09:30:00.419-04, the ITCH feed says an order that was placed immediately prior to the execution (0 seconds difference), executed for 300 sold. The order that was placed did not improve the best offer, and there was already 3100 shares sitting at the price offered by the new order -- that is, it was not first in queue. My order book instead said that if 300 were executing, it should have been off an order that was placed at 09:30:00.419-04 (yes, same time, but much higher order ID). This order set the best offer.

My limit order book is well-tested, so either 1) I have failed to implement a rule that I am unaware of or 2) the ITCH feed is wrong. In general, I always assume I'm at fault until proven otherwise, so I'm going with (1).

I thought that perhaps the subsequently placed but immediately executed order was being executed elsewhere, but I don't see why that would happen.
More...

Not sure I'm following exactly but, are there two messages in the feed, one putting the 300 shares on the offer and the second showing that 300 executing...and they have the same nanosecond timestamp?
Is it possible the 3100 was post-only, there was a bid of at least 300 hidden also post-only at the same price, and the 300 you saw go on the book took that liquidity? I haven't looked closely at the ITCH messages since post-only has been used so not sure if that's how they would do that.

johnny_no_lots · Aug 10, 2012

sma202 and monstimal: thank you both for your responses.

Subsequent to my post, I modified my simulator to execute trades exactly as the ITCH feed specified. The three general classes of ITCH messages -- adds, cancels, and executes -- all have order id's associated with the message. By hitting the order by the exact id, rather than just submitting it to my limit order book, all errors were resolved. (It wasn't just mismatched executions -- it was also failures to delete, given already hit orders.)

At this point, my reconstruction identically reproduces the ITCH feed, but I'm still not sure exactly why the executes are out of order. Perhaps it is due to what sma202 mentioned -- executed orders are not immediately reported. In that case, I have to do some research to see how this is handled on Nasdaq's side. If an order placed on the NASDAQ books is executed with a delayed notice, how does NASDAQ know not to execute it against other orders?

I suspect that monstimal is right -- I failed to integrate certain order types that NASDAQ provides such as post-only [1]. I have read more up on their order types.

[1] http://www.nasdaqtrader.com/content/ProductsServices/Trading/postonly_factsheet.pdf

hft_boy · Nov 19, 2012

Quote from johnny_no_lots:

sma202 and monstimal: thank you both for your responses.

Subsequent to my post, I modified my simulator to execute trades exactly as the ITCH feed specified. The three general classes of ITCH messages -- adds, cancels, and executes -- all have order id's associated with the message. By hitting the order by the exact id, rather than just submitting it to my limit order book, all errors were resolved. (It wasn't just mismatched executions -- it was also failures to delete, given already hit orders.)

At this point, my reconstruction identically reproduces the ITCH feed, but I'm still not sure exactly why the executes are out of order. Perhaps it is due to what sma202 mentioned -- executed orders are not immediately reported. In that case, I have to do some research to see how this is handled on Nasdaq's side. If an order placed on the NASDAQ books is executed with a delayed notice, how does NASDAQ know not to execute it against other orders?

I suspect that monstimal is right -- I failed to integrate certain order types that NASDAQ provides such as post-only [1]. I have read more up on their order types.

[1] http://www.nasdaqtrader.com/content/ProductsServices/Trading/postonly_factsheet.pdf
More...

Yeah, I'm confused, too. I just got a similar file from TradingPhysics, and set up a simulation. My problem is that I have executions against orders which are not BBO, but sitting further back in the queue. Something like, I get order 1 submitted at price X, then a bunch of orders, then order 2. Order 2 is sitting near the back of the queue. Then, it gets executed against. Some *400 seconds* later, order 1 gets executed. Anybody, including OP, have any light to shed on this? Looks like a violation of price-time priority, which can't be right.

Thanks

johnny_no_lots · Nov 19, 2012

hft_boy: From what I understand now, there is some leeway in reporting when an order is crossed. I can hypothesis about why, but it wouldn't be too useful for you, since it's merely conjecture..

Instead, I solved this by simply adding an adjacent structure to my order book implementation that crossed trades from the ITCH feed based on order id. This maintained the integrity of my order book. Then, when doing monte-carlo analysis, I'd execute orders against the top of book, as would be expected.

hft_boy · Nov 19, 2012

Quote from johnny_no_lots:

hft_boy: From what I understand now, there is some leeway in reporting when an order is crossed. I can hypothesis about why, but it wouldn't be too useful for you, since it's merely conjecture..

Instead, I solved this by simply adding an adjacent structure to my order book implementation that crossed trades from the ITCH feed based on order id. This maintained the integrity of my order book. Then, when doing monte-carlo analysis, I'd execute orders against the top of book, as would be expected.
More...

Thanks for the response. But 400 seconds of leeway? First, as far as I understand, there isn't any leeway going on. The ITCH feed provides enough information to model the ECN state. These are all executions against displayed quotes, and the feed is faithful to the order of executions. Otherwise the feed would be broken. Second, 400 seconds of leeway is just too much. IIRC most dark pools have to report trades within 30 seconds or so.

These executions do not randomly happen. As far as I have seen, they happen after movements of BBO. For example, this is from last Friday (11/16/12), tracing the life of order 10460488. All times are seconds past midnight.

here is the quote change
9110 @91200 x 4400 @91300
9110 @91200 x 15400 @91400

about 20 ticks later,
trace 10460488 add buy:true,shares:700,price:91200,time:34210.712000000
position in queue 14

about 200 ticks later, and the first execution since the quote change:
trace 10460488 execute,execshares:700,time:34211.710000000,0
position in queue 9

and this is what the queue of orders at 91200 was at the time
//limit:91200,size:11310,buy:true,is BBO:true//
orderId,shares,entryTime
10177853,300,34204.318000000
10281616,5000,34205.827000000
10403582,1310,34208.890000000
10414938,400,34209.242000000
10414942,600,34209.242000000
10414949,100,34209.242000000
10414968,1000,34209.242000000
10414969,300,34209.242000000
10450934,100,34210.485000000
-->10460488,700,34210.712000000
10460627,600,34210.713000000
10460741,1600,34210.713000000

There must be some sort of priority which I'm not aware of. Maybe these are pegged orders which spontaneously show up at BBO changes for some reason. Or maybe they are routed from other exchanges. Any ideas?

fareastcoast · Nov 20, 2012

Do you know the sale condition of these errant trades? Sometimes, executions are indeed reported very late. And these late trades may even come with a recent exchange timestamp, but are actually from an earlier execution.

Finally, I've found that it is hugely important for backtesting to have, in addition to the exchange timestamp, the ticker factory timestamp. What I mean by ticker factory timestamp is, the timestamp from the instant the market data vendor received and processed the tick. Obviously, this depends on where the ticker factory is located.

For instance, in QuantQuote market data, their ticker factory timestamp is for a location on the NJ side. If you were to host your machine on the NY side or elsewhere outside of the exchange colo facilities, the longest delay exchange could be different. That's why it is hard to really account for these high frequency effects until you are in your production network environment.

hft_boy · Nov 21, 2012

Quote from fareastcoast:

Do you know the sale condition of these errant trades? Sometimes, executions are indeed reported very late. And these late trades may even come with a recent exchange timestamp, but are actually from an earlier execution.

Finally, I've found that it is hugely important for backtesting to have, in addition to the exchange timestamp, the ticker factory timestamp. What I mean by ticker factory timestamp is, the timestamp from the instant the market data vendor received and processed the tick. Obviously, this depends on where the ticker factory is located.

For instance, in QuantQuote market data, their ticker factory timestamp is for a location on the NJ side. If you were to host your machine on the NY side or elsewhere outside of the exchange colo facilities, the longest delay exchange could be different. That's why it is hard to really account for these high frequency effects until you are in your production network environment.
More...

I don't know the sale conditions. AFAIK these are vanilla displayed orders with vanilla executions, as specified in the ITCH spec. I don't understand how the executions can be delayed without breaking the feed. For example, suppose there is one order setting the best bid. If it gets executed against, and Nasdaq reports the trade late -- for whatever reason -- then everybody will have a false idea of what the best displayed bid is.

Thanks for the remark about timestamps and backtesting. At the moment, it is not so important. All that matters is that the messages are in the right order. Since the data seems to just be a dump of the packets that the ticker plant (TradingPhysics) received, I don't see why they wouldn't be.

fareastcoast · Nov 21, 2012

Unfortunately out of sequence trades occur quite often and it is necessary to use sale condition to distinguish them. This is something that the QuantQuote tick data has, the NASDAQ ITCH data should have it as well....

Basically, after an execution occurs, the exchanges don't always report it right away for a variety of reasons. If it reports it out of sequence, it assigns a special sale condition code to it so you know that execution is out of sequence. Usually in testing and simulation, I throw these trades out because you really can't say with a lot of certainty where in the sequence it occurred.