Question on sharing accumulated quotes between ZeroMQ processes

nitro · Mar 12, 2016

botpro said:
Here the mentioned functions for fast direct reading/writing memory between processes w/o using shared memory, provided by newer Linux kernels:

This article is good, but is outdated (the function names have changed since, IIRC):
https://lwn.net/Articles/405346/

These are the man pages of the final function names:
$ man process_vm_readv
$ man process_vm_writev
More...

I looked into this about 2 years ago and I wish I could remember what I came up with, but I went against it. I just can't remember why. It will come to me if I look at my code base again.

botpro · Mar 12, 2016

If all modules work on the same machine and if the modules are in the same language like the main process (or can be linked in),
then the best and fastest solution would be to have a single process with multiple threads, as was already also stated by vicirek.
Nowadays such trading machines have many CPU cores and have usually at least 16GB of RAM, so everything can be run on the same machine.
No need for any message passing protocol then, as one can call the methods directly
(or use a producer/consumer queue model); one just needs to use one or more mutices
for the shared access to the resources, ie. resource locking. That all is already available since C++11, and even longer in Boost.

nitro · Mar 12, 2016

botpro said:
If all modules work on the same machine and if the modules are in the same language like the main process (or can be linked in),
then the best and fastest solution would be to have a single process with multiple threads, as was already also stated by vicirek.
Nowadays such trading machines have many CPU cores and have usually at least 16GB of RAM, so everything can be run on the same machine.
No need for any message passing protocol then, as one can call the methods directly
(or use a producer/consumer queue model); one just needs to use one or more mutices
for the shared access to the resources, ie. resource locking. That all is already available since C++11, and even longer in Boost.
More...

The whole idea of not having something be one massive monolithic program and instead break the program up into processes, assuming it is not HFT [which is almost a joke to talk about on ET]

1) It scales far better and it is more fault tolerant.
2) Multi-threading is hard and extremely risky. Designing at the process level instead and communicating the processes is much much easier and far less risky to have a program that bankrupts you with a trivial but obscure threading bug.

It all depends on what he is intending to do. I have had to design systems that could scale to thousands of symbols with a depth quote of 100 on each side and have it all respond in about 300 micros. Impossible to do on an even massive core massive memory machine.

Even in that case, I would go single process communicating either with ZMQ or just raw shared memory.

botpro · Mar 12, 2016

nitro said:
The whole idea of not having something be one massive monolithic program and instead break the program up into processes, assuming it is not HFT [which is almost a joke to talk about on ET]

1) It scales far better and it is more fault tolerant.
2) Multi-threading is hard and extremely risky. Designing at the process level instead and communicating the processes is much much easier and far less risky to have a program that bankrupts you with a trivial but obscure threading bug.

It all depends on what he is intending to do. I have had to design systems that could scale to thousands of symbols with a depth quote of 100 on each side and have it all respond in about 300 micros. Impossible to do on an even massive core massive memory machine.

Even in that case, I would go single process communicating either with ZMQ or just raw shared memory.
More...

Yes, multithreading is for professional programmers, newbies will need much experience to get it right.
I myself prefer multithreading.
And, one of course should never let newbies write such an important software... ;-)

vincegata · Mar 13, 2016

@botpro @vicirek I have a system that consists on four processes and runs on a
single box. I am using C++ threading, and I am using 70-s style POSIX FIFO,
sockets, and select(). Two of the processes write the data into FIFO on the
main thread and listen to the incoming data on the child thread. It's all
working well and fast, however it's getting complicated when I want to get the
data from multiple sources, such as downloading interest rates, options prices,
news, etc besides the quotes. It is also difficult to have multiple strategy
modules. Note, the strategy module needs read access to all those accumulated
data. Hence, I am pretty much set on ZeroMQ so I won't have to worry about
race conditions, critical sections, etc.

Reiterating @nitro, ZMQ supports multiple transport protocols: shared memory,
sockets, FIFO, UDP. Switching between them is a matter of passing a different
parameter to a function. ZMQ is sure not easy to learn, but it solves all the
scalabity issues. You can configure it to have multiple publishers, multiple
workers, multiple subscribers and it's done in a few lines of code.

What I need is for the consumers - strategy process(es) - to access the
accumulated data. I need a way to store the data into, say, std::vector<class
Quote> and have other processes to easily accesses those data using iterators.

vincegata · Mar 13, 2016

2rosy said:
You dont need to share anything just pubsub quotes from aggregator to consumers. Do all your consumers need all the quotes?
More...

Consumers do need all the quotes. e.g. if I want to calculate moving average and std. dev.

2rosy · Mar 13, 2016

vincegata said:
What I need is for the consumers - strategy process(es) - to access the
accumulated data. I need a way to store the data into, say, std::vector<class
Quote> and have other processes to easily accesses those data using iterators.
More...

Usually each process consumes the data it needs. So if you have strategy trading corn it doesn't need to consume oil quotes. It gets a corn quote message deserializes it and stores it or uses it somehow

botpro · Mar 13, 2016

Hopelessly inefficent and complicated... My last word.
Wish you much luck with your project, you will need it.

vincegata · Mar 13, 2016

2rosy said:
Usually each process consumes the data it needs. So if you have strategy trading corn it doesn't need to consume oil quotes. It gets a corn quote message deserializes it and stores it or uses it somehow
More...

Sure, it's the way to do it without sharing the data between processes. It'd have to lock it, like I currently do, o.w. I'd get Segmentation fault. I want to share though. Some data don't need to be shared, other data need to be shared.

hoppla · Mar 14, 2016

We've been using ZMQ in production for about a year now and I have only good things to say about it. It's really worth it to study the guide and look at the different design patterns. They are even useful to know and understand when doing everything in-process without (ZMQ) sockets but just passing messages around ie via queues or other means. To me it is a lot smarter approach to sharing state than using userland locks or mutexes. You also don't have to worry about using any concurrent data structures as long as you stick to the paradigm. I'd even go so far as to say building distributed architectures with ZeroMQ is really easy once you have all the socket types down.

To be honest I have not benchmarked it against raw sockets or a homegrown API built on top of native sockets as for us the latency we add from using ZMQ is nothing in comparison to saved development time and overall execution latencies we have anyway that are beyond our control without DMA. Simply not worth it to think about.

Re your question "So, my question is, how to access the accumulated data between the modules?":

There is a few ways to go about it and it all depends on your requirements. You could have a central broker (or XPUB/XSUB Proxy in 0MQ lingo) that your OMS, Strategy modules... subscribe to and vice versa you have different feeding entities (ie one for rates, one for option prices and so on) that publish on the XPUB socket. Dependent modules then subscribe basis the topics or topic tree you have determined. This is very simple to set up however you don't have any "recall" so to speak, ie if one of your strategy modules come on late, are being restarted and so forth they may miss data. So you need some sort of channel to recover the data if it went missing and a way to determine data has gone missing in the first place (ie via message sequencing). This may be a bit more complicated than 0MQ's Last Value Caching paradigm. We've gone with a variation on the Router/Dealer proxy to recover missing data. For this to work you will have to implement some sort of communication protocol between entities.

I can see why other members say this is overkill but as soon as you try to scale to a second machine you dont have much choice. Also with 0MQ this is quite easily done IMO so the development time will be a lot shorter compared to other solutions. Plus, at some point coding this way becomes a costsaver if you write against some third party APIs/ providers.