Here is some data collected today as the ES went to 1505.25 for the last time, then began going up. It was captured from IB's data feed, and it is a sequence of price and size events. This is the order in which they occurred. Does anyone know why: 1. "last size" occurs twice 2. Sometimes one "last size" added to the previous "volume" will equal the next "volume," but not always. Oh! And one very important question: What price did the "last size" occur at? last price: 1505.500000, 0 last size: 198 last size: 198 volume: 1425299 bid price: 1505.250000, 1 bid size: 824 ask price: 1505.500000, 1 ask size: 52 bid size: 824 ask size: 52 last size: 103 volume: 1425402 bid size: 748 ask size: 20 last size: 1 volume: 1425416 bid size: 715 ask size: 513 last price: 1505.250000, 0 last size: 101 last size: 101 volume: 1425517 bid size: 606 ask size: 487 bid size: 573 last price: 1505.500000, 0 last size: 347 last size: 347 volume: 1425904 bid size: 561 ask size: 101
From what I understand, IB data feed does not provide every single tick; which can result in some strange trade sequences. I know this to be true under the API toolkit, so I would also assume it is true under their traders workstation.
Figured out what to do: I'll watch the IB matrix when the market is slow, take notes, then compare with the data captured from the API. I can tell what is happening on the matrix, when it is slow, so I should be able to figure out what the sequence is in the API data by comparing with my matrix notes. I'll report back on the results. May try Sunday night.
The duplicate size problem is discussed here (IB login required): http://www.interactivebrokers.com/cgi-bin/discus/board-auth.pl?file=/2/40099.html Here is an expert from Richard King: The IB datafeed is optimised to ensure that it keeps up with the market no matter how busy the market is. To accomplish this, it effectively sends a price snapshot for each instrument at regular intervals. This interval seems to be about 300 milliseconds. For each of bid, ask, and last it compares the current price and size with the values at the last snapshot. If the price is different it sends both price and size. If the price is the same, but the size is different it sends only the size. If both price and size are the same, it doesn't send either. If there have been any trades since the last snapshot, it sends the (accumulated) volume (so where the price and size haven't changed but there have been one or more trades, this can be detected from the increased volume). A word of caution though: this is not an exact science. It would be nice if what I said in my post was an exact description of how it works, but you'll find odd things happening occasionally, such as a volume update without a prior size message where the increase in volume is not an exact multiple of the most recent size message, or multiple last price/size messages sent at the same time, or volume messages with a smaller volume than the previous one! But most of the time my description is accurate. By the way, one gotcha is that when both price and size messages are sent (in a single TICK_PRICE socket message), TWS also sends the size again in a separate TICK_SIZE message, but the volume is correctly updated only once. I think the reason for this duplication is that before the version 2 TICK_PRICE message was introduced, it didn't contain a size field, so prices and sizes were always sent separately: if TWS didn't send the duplicate size, then programs that relied on the separate TICK_SIZE message would no longer work properly unless they were amended and recompiled. This mechanism eables IB to know the maximum bandwidth required for each ticker, and hence for each customer (since the number of tickers is limited), and so it can size its servers to be able to cope with that load. If a market becomes very busy, it makes no difference because it will still only send an update three times a second or thereabouts, even if there have been 100 trades during that second. This avoids the problem that every other data feed seems to have, where the data will sometimes lag way behind the market at busy times (with every other vendor I've used, I've had occasions where the data could be anything up to two or three minutes behind the market). There is an irritating side effect of this technique, which is that price movements between shapshots may not be reported at all: for example if the last price at snapshot 1 is 100, and then price moves up to 102 and then back to 101 by snapshot 2, the price reported at snapshot 2 will be 101, and the 102 price will not be reported at all. This leads to occasional incorrect highs and lows of bars, but rarely by more than one tick: whether that is significant depends very much on the trading strategy used. The above isn't a complete description, but it covers the basic mechanism.
Thanks for that information. I registered there. In spite of the problems, I hope to mine the data I am collecting. Am working with gnuplot -- heard of it in Bollinger's book -- which has a finance.dem script from Bollinger himself!
Here is some data from this evening's session, right after connecting via the API. It appears there may be two "problems": 1. A volume increase without a "last size" event. 2. A volume increase with two "last size" events, one of which should be ignored, if the "volume" event is accurate. The order of the fields is: Timestamp Event Ticker Value AutoExecute (if applicable) Code: 18:45:32:213 ask size 8 70 18:45:32:215 last price 8 1516.500000 0 18:45:32:216 last size 8 5 18:45:32:218 bid size 8 71 18:45:32:219 ask size 8 70 18:45:32:222 last size 8 5 18:45:32:226 volume 8 3548 18:45:32:228 high price 8 1516.750000 0 18:45:32:229 low price 8 1514.750000 0 18:45:32:231 close price 8 1515.500000 0 18:45:46:330 ask size 8 75 18:45:56:856 last size 8 1 <- one last size 18:45:56:858 volume 8 3549 18:45:56:859 bid size 8 70 18:46:07:855 ask size 8 95 18:46:19:606 bid size 8 71 18:46:40:330 bid size 8 70 18:47:01:116 volume 8 3550 <- no last size 18:47:01:118 bid size 8 69 18:49:08:351 last size 8 6 <- one last size 18:49:08:387 volume 8 3556 18:49:08:428 bid size 8 63 18:49:56:142 last price 8 1516.750000 0 18:49:56:144 last size 8 2 <- 1st last size 18:49:56:145 last size 8 2 <- 2nd last size 18:49:56:146 volume 8 3558 18:49:56:148 ask size 8 93 18:50:19:893 ask size 8 98 18:50:26:893 bid size 8 68 18:50:33:395 last size 8 5 18:50:33:397 volume 8 3563 18:50:33:398 ask size 8 93 18:50:33:895 last size 8 10 18:50:33:896 volume 8 3573 18:50:33:898 ask size 8 73 18:50:38:398 bid size 8 69 18:50:39:647 ask size 8 84 18:50:40:147 bid size 8 74 18:50:46:370 bid size 8 75 18:50:58:401 ask size 8 85 18:50:58:650 last size 8 1 <- one last size
If you just ignore the LAST_SIZE events and work from the VOLUME_EVENT, things will be fine. The cumulative volume is accurate.