IBRK - collecting and processing tick-by-tick data

Discussion in 'Automated Trading' started by BrazilForever, Sep 16, 2022.

  1. I use below Python code to collect tick-by-tick daily trades for SPY (using IBRK). Code works fine without thread locking, but if I introduce thread locking I start missing many ticks. My guess is that thread locking is very time consuming and server buffer is really small, so some data gets dumped into nowhere.
    I am trying to think of a way to fix this problem, but do not know how to avoid locking thread. Locking thread ensures that I can safely put data into list (or whatever collection structure), before accessing it for analysis/decision making routine.
    I ran it on Google compute instance with 2 cores and 8 GB ram which also has IBRK TWS hosted

    Any suggestions?
    P.S. sorry for cross-posting:

    from ibapi.client import EClient
    from ibapi.wrapper import EWrapper
    from ibapi.contract import Contract
    from ibapi.ticktype import TickTypeEnum
    import time
    import threading

    class TestApp(EWrapper, EClient):
    def __init__(self):
    EClient.__init__(self, self)
    self.last_exchange = []
    self.last_market_time = []
    self.last_price = []
    self.last_size = []
    self.lock_last = threading.Lock()

    def error(self, reqId, errorCode, errorString):
    print("Error: ", reqId, " ", errorCode, " ", errorString)

    def tickByTickAllLast(self, reqId, tickType, time, price, size, tickAtrribLast,
    exchange, specialConditions):

    super().tickByTickAllLast(reqId, tickType, time, price, size, tickAtrribLast,
    exchange, specialConditions)

    collectDate(self, exchange, time, price, size)

    def collectData(self, exchange, time, price, size):
    self.lock_last.acquire()

    self.last_exchange = self.last_exchange + [exchange]
    self.last_market_time = self.last_market_time + [time]
    self.last_price = self.last_price + [price]
    self.last_size = self.last_size + [size]

    self.lock_last.release()

    app = TestApp()

    contract = Contract()
    contract.symbol = "SPY"
    contract.secType = "STK"
    contract.exchange = "SMART"
    contract.currency = "USD"
    contract.primaryExchange = "ARCA"

    app.connect("127.0.0.1", 4002, 0)
    time.sleep(1)
    app.reqTickByTickData(1 ,contract, "Last", 0, False)
    api_thread = threading.Thread(target=app.run)
    api_thread.start()
     
    blueraincap likes this.
  2. 2rosy

    2rosy

    remove the locking. you just need to append to a list not create a new list by doing ...
    newlist = oldlist + [new data]

    so instead do for all your lists ...

    self.last_size.append(size)
     
    M.W. and Baron like this.
  3. Thank you for the reply! Using append is a good idea, it is probably will be faster. But I am not sure if we can drop thread lock. Imagine there is another thread (call it analysis thread) running data analysis function below.

    def checkForTrades(self.last_size, self_last_price, self.last_market_time):
    return(some_opportunities)

    If checkForTrades gets called before collectData is done, then we might end up with updated price list, but with not update size list. Or maybe even something worse, since thread switching is tricky.

    So, it is not clear to me how to ensure that updates are done before analysis starts.
     
  4. 2rosy

    2rosy

    first, you only need one list. Actually, you should use a queue and populate it with objects that hold all your last values. You do not need locks or threads
     
    M.W. and d08 like this.
  5. Queue is good idea, it is thread safe. I already tried it. Since checkForTrades gets called every 1 or 2 seconds, queue might accumulate 20-30 values, so I have to fully empty queue. But I guess when I do it, I fall behind collectData thread and lose some values.
    Plus I don't really need full queue functionality with FIFO. Also thread locking is probably how thread safety is achieved in queue.
     
    globalarbtrader and spy like this.
  6. hanneswas

    hanneswas

    Easy. Grab a book about distributed system design. ;)

    The basic always described: use a message bus to communicate between modules without locks.
    If you publish your new ticks over mqtt for example, you can create a new process (in python, docker or even another computer) which can handle and process the ticks without to interrupt the first one
     
    spy likes this.
  7. You are right, I need to get distributed system design book. Because of it, I do not understand the rest of your answer.
     
  8. I haven't used the IB library myself, and am not sure what you mean by 'server' in this context, but as far as I understand the buffer size is meaningful (at least theoretically in this context) to manage the situation where data can come in bursts and pauses, and the buffer size should be large enough to hold the longest possible burst (or based on the max length of time during which rate of bursts can exceed the rate of clearing). But the rate of clearing of the buffer should exceed the average rate at which data comes in (pauses and bursts considered together). If this is the problem, increasing the buffer size even if it was possible won't help.

    If I refer to collectData as the producer, and checkForTrades as the consumer, the basic problem is that the consumer is not processing the data as fast as the producer is producing it. If that is the case, the queue will become full at some point. Maybe get the consumer to run on the second core (perhaps as a different process) or another node/instance, or optimize what you are doing in the consumer.

    I have no experience in programming trades or HPC, but just out of curiosity, with what sort of frequency are the ticks coming in? Is it the order of nanoseconds? microseconds? milliseconds?

    Okay, I think you answered it here. Sounds like 20-30 values in a couple of seconds, which isn't a lot - but of course depends on what you are doing with each value. But if these values can be processed independently they can be programmed to be distributed among different instances/vCPU/cores/whatever.

    But if it's as slow as 10-15 ticks per second, losing values because of a lock/unlock sounds puzzling.
     
    Last edited: Sep 17, 2022
  9. Thank you joyfultrader, SPY gets roughly 400k trades a day (between 9:30am 4:00 pm). So we can compute average speed. Sometimes it peaks, but I don't really know how much. I would guess something around nanoseconds. At least this is what moder HFT systems are built for with with FPGA and other complicated machinery.
     
  10. Okay, 400000(trades)/(6.5(hrs)*3600(seconds/hr)) ~ 17 trades/sec , which is roughly in line with the 10-15 figure we calculated above. That is around 58 milliseconds per trade, not fast by machine standards, but reckon there would be other stocks that dwarf this speed. Yes, and I guess when SPY peaks, it may be some orders higher.
     
    Last edited: Sep 17, 2022
    #10     Sep 17, 2022