IBRK - collecting and processing tick-by-tick data

BrazilForever · Sep 16, 2022

I use below Python code to collect tick-by-tick daily trades for SPY (using IBRK). Code works fine without thread locking, but if I introduce thread locking I start missing many ticks. My guess is that thread locking is very time consuming and server buffer is really small, so some data gets dumped into nowhere.
I am trying to think of a way to fix this problem, but do not know how to avoid locking thread. Locking thread ensures that I can safely put data into list (or whatever collection structure), before accessing it for analysis/decision making routine.
I ran it on Google compute instance with 2 cores and 8 GB ram which also has IBRK TWS hosted

Any suggestions?
P.S. sorry for cross-posting:

from ibapi.client import EClient
from ibapi.wrapper import EWrapper
from ibapi.contract import Contract
from ibapi.ticktype import TickTypeEnum
import time
import threading

class TestApp(EWrapper, EClient):
def __init__(self):
EClient.__init__(self, self)
self.last_exchange = []
self.last_market_time = []
self.last_price = []
self.last_size = []
self.lock_last = threading.Lock()

def error(self, reqId, errorCode, errorString):
print("Error: ", reqId, " ", errorCode, " ", errorString)

def tickByTickAllLast(self, reqId, tickType, time, price, size, tickAtrribLast,
exchange, specialConditions):

super().tickByTickAllLast(reqId, tickType, time, price, size, tickAtrribLast,
exchange, specialConditions)

collectDate(self, exchange, time, price, size)

def collectData(self, exchange, time, price, size):
self.lock_last.acquire()

self.last_exchange = self.last_exchange + [exchange]
self.last_market_time = self.last_market_time + [time]
self.last_price = self.last_price + [price]
self.last_size = self.last_size + [size]

self.lock_last.release()

app = TestApp()

contract = Contract()
contract.symbol = "SPY"
contract.secType = "STK"
contract.exchange = "SMART"
contract.currency = "USD"
contract.primaryExchange = "ARCA"

app.connect("127.0.0.1", 4002, 0)
time.sleep(1)
app.reqTickByTickData(1 ,contract, "Last", 0, False)
api_thread = threading.Thread(target=app.run)
api_thread.start()

2rosy · Sep 16, 2022

remove the locking. you just need to append to a list not create a new list by doing ...
newlist = oldlist + [new data]

so instead do for all your lists ...

self.last_size.append(size)

BrazilForever · Sep 16, 2022

Thank you for the reply! Using append is a good idea, it is probably will be faster. But I am not sure if we can drop thread lock. Imagine there is another thread (call it analysis thread) running data analysis function below.

def checkForTrades(self.last_size, self_last_price, self.last_market_time):
return(some_opportunities)

If checkForTrades gets called before collectData is done, then we might end up with updated price list, but with not update size list. Or maybe even something worse, since thread switching is tricky.

So, it is not clear to me how to ensure that updates are done before analysis starts.

2rosy · Sep 16, 2022

first, you only need one list. Actually, you should use a queue and populate it with objects that hold all your last values. You do not need locks or threads

BrazilForever · Sep 16, 2022

Queue is good idea, it is thread safe. I already tried it. Since checkForTrades gets called every 1 or 2 seconds, queue might accumulate 20-30 values, so I have to fully empty queue. But I guess when I do it, I fall behind collectData thread and lose some values.
Plus I don't really need full queue functionality with FIFO. Also thread locking is probably how thread safety is achieved in queue.

hanneswas · Sep 16, 2022

Easy. Grab a book about distributed system design.

The basic always described: use a message bus to communicate between modules without locks.
If you publish your new ticks over mqtt for example, you can create a new process (in python, docker or even another computer) which can handle and process the ticks without to interrupt the first one

BrazilForever · Sep 17, 2022

You are right, I need to get distributed system design book. Because of it, I do not understand the rest of your answer.

joyfultrader · Sep 17, 2022

BrazilForever said:
My guess is that thread locking is very time consuming and server buffer is really small, so some data gets dumped into nowhere.
More...

I haven't used the IB library myself, and am not sure what you mean by 'server' in this context, but as far as I understand the buffer size is meaningful (at least theoretically in this context) to manage the situation where data can come in bursts and pauses, and the buffer size should be large enough to hold the longest possible burst (or based on the max length of time during which rate of bursts can exceed the rate of clearing). But the rate of clearing of the buffer should exceed the average rate at which data comes in (pauses and bursts considered together). If this is the problem, increasing the buffer size even if it was possible won't help.

Since checkForTrades gets called every 1 or 2 seconds, queue might accumulate 20-30 values, so I have to fully empty queue. But I guess when I do it, I fall behind collectData thread and lose some values.
More...

If I refer to collectData as the producer, and checkForTrades as the consumer, the basic problem is that the consumer is not processing the data as fast as the producer is producing it. If that is the case, the queue will become full at some point. Maybe get the consumer to run on the second core (perhaps as a different process) or another node/instance, or optimize what you are doing in the consumer.

if I introduce thread locking I start missing many ticks.
More...

I have no experience in programming trades or HPC, but just out of curiosity, with what sort of frequency are the ticks coming in? Is it the order of nanoseconds? microseconds? milliseconds?

Since checkForTrades gets called every 1 or 2 seconds, queue might accumulate 20-30 values, so I have to fully empty queue. But I guess when I do it, I fall behind collectData thread and lose some values.
More...

Okay, I think you answered it here. Sounds like 20-30 values in a couple of seconds, which isn't a lot - but of course depends on what you are doing with each value. But if these values can be processed independently they can be programmed to be distributed among different instances/vCPU/cores/whatever.

But the rate of clearing of the buffer should exceed the average rate at which data comes in
More...

But if it's as slow as 10-15 ticks per second, losing values because of a lock/unlock sounds puzzling.

BrazilForever · Sep 17, 2022

Thank you joyfultrader, SPY gets roughly 400k trades a day (between 9:30am 4:00 pm). So we can compute average speed. Sometimes it peaks, but I don't really know how much. I would guess something around nanoseconds. At least this is what moder HFT systems are built for with with FPGA and other complicated machinery.

joyfultrader · Sep 17, 2022

Okay, 400000(trades)/(6.5(hrs)*3600(seconds/hr)) ~ 17 trades/sec , which is roughly in line with the 10-15 figure we calculated above. That is around 58 milliseconds per trade, not fast by machine standards, but reckon there would be other stocks that dwarf this speed. Yes, and I guess when SPY peaks, it may be some orders higher.