Appropriate design pattern for recording instant tick data in memory

vicirek · Dec 14, 2014

It is getting too complicated for search of one out of 40. You can do it many times over and many ways at once and it will not matter in terms of performance.

Check if search is even necessary. Many API assign index to your request or have other specific integer ID attached to returned contract data and you may use it to directly index to your struct using just that. Check the documentation because I have no experience with Rhytmic API.

The best way to address cache performance is to have continuous layout of data in the memory instead of pointers to objects scattered all over main memory. Other than that there is little you can do to affect processor heuristic in prefetching data.

Keep in mind that compiler will pad structs if necessary to align fields and the sizeof can be different than calculated by hand (compiler options how to pad structs are often available). Your solution creates unnecessary large struct.

volpunter · Dec 15, 2014

second that, OP, what are your latency requirements. Recording should never be a latency priority. The lookups more so. But what exact specs do you have?

hft_boy said:
Yep, no matter which OS / runtime abstraction you use (threads, processes, etc), you can't escape the fact that you are ultimately using hardware synchronization primitives, and do all sorts of nasty things to your cache. I think if you use spinlocks to bypass the scheduler *and you are extremely careful* you can get hundreds of nanos instead of micros for intercore communication. But it is more difficult to reason about the determinism and will take 10x longer to write.

Honestly, the whole thing smells. The more interesting question is what latencies and throughput do you need? And architect around that. IMO there is not really any point to say, let's get this as fast as it can go because it needs to be really fast in the future, but rather, we need XYZ performance now, how can we get there. Each level of performance needs a special design anyways, you are going to save yourself a lot of time and headaches by not solving problems that don't need to be solved.

Sorry to go off on a rant and second guess your decision for you, just offering my opinion.
More...

rohan2008 · Dec 15, 2014

vicirek said:
Check if search is even necessary. Many API assign index to your request or have other specific integer ID attached to returned contract data and you may use it to directly index to your struct using just that. Check the documentation because I have no experience with Rhytmic API.
More...

RAPI doesn’t have the integer ID for tick data; it does have id for other data such as order reports etc.

vicirek said:
second that, OP, what are your latency requirements. Recording should never be a latency priority. The lookups more so. But what exact specs do you have?
More...

Ok, here’s the issue at hand: I have some low/medium frequency strategies that rely on one min candles. I trade in higher timeframes. I am sensing that I can develop a few algos that can examine real-time tick data and reduce some entry risk for my existing strategies… My issues is how fast my algorithm should execute in order to input all the ticks that I get from the market without a lag? Note that I am not too worried about latency involved in responding to the ticks though (that’s an another topic).

Assuming that I subscribe to tick data for 40 contracts (20 current & 20 forward; no options) and lets say, I get about 20 million ticks per day. Now, if we assume that 80% of the volume happens within 5 hrs in the morning … so that’s (80% * 20 million/(3600 *5)), which roughly comes to about 1000 trades a second… which gives me roughly about 1ms to save per tick… which is quite manageable (Please correct me if I am way off here). However, once in a while I do see short-term instant volume spikes (e.g., news events, I once say 500 contracts sold for NG in one sec) and so I was wondering how fast should my algorithm be in order to input all the ticks. So, if I use 1000 trades/sec as my average base line, what kind of instantaneous volume spikes would I end up seeing? The original question was a result of this issue

So, folks, please enlighten me here since I myself don’t have much experience with tick data. What is the average trades/sec number do you normally notice during the day and what kind of volume spikes do you observe once in a while. How many micros do we end up having in order to successfully input a tick without generating a lag.

appreciate your suggestions....

volpunter · Dec 15, 2014

I still do not fully comprehend your question; are you asking about the storing mechanism of the incoming ticks and related latencies? Or are you asking about retrieving the stored ticks? In case you refer to storage, you do not need to care about latency sensitivity when it comes to storing the data. You store a buffer in memory and periodically persist chunks of data in memory. Does not matter whether you deal with 1 million ticks a day or 50 million.

rohan2008 said:
RAPI doesn’t have the integer ID for tick data; it does have id for other data such as order reports etc.

Ok, here’s the issue at hand: I have some low/medium frequency strategies that rely on one min candles. I trade in higher timeframes. I am sensing that I can develop a few algos that can examine real-time tick data and reduce some entry risk for my existing strategies… My issues is how fast my algorithm should execute in order to input all the ticks that I get from the market without a lag? Note that I am not too worried about latency involved in responding to the ticks though (that’s an another topic).

Assuming that I subscribe to tick data for 40 contracts (20 current & 20 forward; no options) and lets say, I get about 20 million ticks per day. Now, if we assume that 80% of the volume happens within 5 hrs in the morning … so that’s (80% * 20 million/(3600 *5)), which roughly comes to about 1000 trades a second… which gives me roughly about 1ms to save per tick… which is quite manageable (Please correct me if I am way off here). However, once in a while I do see short-term instant volume spikes (e.g., news events, I once say 500 contracts sold for NG in one sec) and so I was wondering how fast should my algorithm be in order to input all the ticks. So, if I use 1000 trades/sec as my average base line, what kind of instantaneous volume spikes would I end up seeing? The original question was a result of this issue

So, folks, please enlighten me here since I myself don’t have much experience with tick data. What is the average trades/sec number do you normally notice during the day and what kind of volume spikes do you observe once in a while. How many micros do we end up having in order to successfully input a tick without generating a lag.

appreciate your suggestions....
More...

rohan2008 · Dec 15, 2014

volpunter said:
I still do not fully comprehend your question; are you asking about the storing mechanism of the incoming ticks and related latencies? Or are you asking about retrieving the stored ticks?
More...

The former; within what time should we store an incoming tick in order to avoid a lag on the trading system side... for 1000 ticks as my rough math suggested, its about 1 ms, which is quite reasonable. But do we see any volume spikes in the tick data that it can even go up to say 10,000 ticks for a few seconds which gives me 100 micros. Even this is manageable for futures. STL might be more than sufficient for this and if need be, go with the contiguous single threaded array approach if we want to get it down the nano sec level. Guess I got my answer... but any info about volume spikes that you notice can be helpful. thanks.

volpunter · Dec 15, 2014

I recommend you segregate the concepts of storage vs processing of ticks as part of your strategy engine. Your strategy engine should optimally be able to processing all incoming ticks even during volume bursts. If that is an issue then you may potentially "sample" incoming tick data (see Interactive Broker's technology for reference). That is obviously not a choice if you trade high frequency strategies where your model depends on micro market dynamics.

Now, an entirely segregated topic is the storage of incoming ticks. That should not at all impact your actual trading framework. You want to store and handle the storage on an entirely different thread/task/process. Latency and performance is not critical at all in this area. You should concern yourself much more with persisting data in a safe and memory/space efficient way. Latency is not critical here.

I recommend you define exactly which latency you are concerned with. Storing data in the most time efficient manner imho is not important.

rohan2008 said:
The former; within what time should we store an incoming tick in order to avoid a lag on the trading system side... for 1000 ticks as my rough math suggested, its about 1 ms, which is quite reasonable. But do we see any volume spikes in the tick data that it can even go up to say 10,000 ticks for a few seconds which gives me 100 micros. Even this is manageable for futures. STL might be more than sufficient for this and if need be, go with the contiguous single threaded array approach if we want to get it down the nano sec level. Guess I got my answer... but any info about volume spikes that you notice can be helpful. thanks.
More...

Butterfly · Dec 17, 2014

volpunter said:
You store a buffer in memory and periodically persist chunks of data in memory.
More...

volpunter, don't try to use language you don't fully understand, it makes you look foolish, again !!! LOL

jesus christ, what a silly amateur you make. Hopefully no company is foolish enough to pay you !!!

931 · Jun 28, 2020

rohan2008 said:
No, don't get me wrong; STLs are great libraries, no question about that. I personally use STL libraries (Poco actually) and other C++11 features pretty extensively in the trading system that I have developed. Having said that, I have personally observed this the years:

The architecture of any high performance system can be divided into critical paths (5% of the code that executes 95% of the time) and non-critical paths (rest of the 95 code where execution happens 5% of the time). As I see STLs are awesome for the non-critical paths; I use them extensively. However, I have seen direct performance degradation when STLs are used in the critical code path. I can point you to real life examples where this happened here in the silicon valley. Most often companies developing user space device drivers in C++ hire Linux kernel device drivers, ask them to learn C++ and develop code in the critical code paths. Every person who learns C++ gets fascinated with containers [including myself ] initially, use containers in the critical code, find out that they need to make the code run faster… experiment with replacing containers and then find out that traditional C data structures dramatically improve performance.

To give you some numbers, I have used G++ 4.7 with standard clib on linux 3.11 (I guess) and I found that new() takes 3 times more clock cycles than malloc(); we have confirmed this by running perf tools over tcmalloc. This has direct impact on the way we structure out code in the critical path. Too many object allocations bring down the performance. Another example I can tell you is std::stringstream (stay away from it in critical paths). Vectors are slow compared to linked lists; I have had first hand experience with this. Variadiac templates as function arguments are fast, but when used in class constructions literally drag down the performance. Class instantiations take time as well. One example I can quote from my professional life was; we once had to design a file system checker/fixer and we had a situation where a loop laden with C++ containers stuff took 64 secs to iterate through 750 x 7 million times in a O3 build. Over a few weeks, we replaced all the containers and class instantiations with hand optimized code in that while loop and were able to reduce this to 12 sec! I haven’t taken this issue up with Bjarne, because finishing the trading system/professional assignments are a higher priority for me at this stage in my life. I know... to prove my assertions, I have to spend quite a bit of time in getting some numbers; unfortunately I can't afford to spend time on all this...

Either way, I always advocate hand optimizing all the code that falls in the critical path. Since, as implied in my earlier post, all the transactions that happen in the US futures market go through this component that I am designing; this is definitely one of those critical path areas and so I wanted to know how others approach it.
More...

rohan2008 said:
Vectors are slow compared to linked lists
More...

Maybe i cut context out, but can you elaborate on that?