Question on sharing accumulated quotes between ZeroMQ processes

vincegata · Mar 11, 2016

Hi,

I've developed an ats using mutlithreading and FIFO. Though it's working, it's
very difficult to expand, e.g going from a single data access module to
multiple. Hence, I've decided to switch to ZeroMQ. I am using C++ on Linux and I
am processing every quote.

What I would like to achieve is to have one or more data access modules
(processes), data aggregator module, strategy module, OMS/Risk module, GUI
module, and order execution module all communicating using ZeroMQ.

I believe I've planned it all correctly as ZeroMQ stands, however I am not sure
what is the best way for the strategy module to access the accumulated quotes.
One way is not to have a data aggregator module but to accumulate quotes in the
strategy module but then I am back to multithreading. And, once I decide to have
multiple strategy modules I will be back to scalability issue.

So, my question is, how to access the accumulated data between the modules?

I am looking into POSIX shared memory, mapped memory, Boost.Interprocess,
Reddis. I want a solution that is easy to implement, reliable, and fast enough. I am
not after every microsecond but I run backtesting on the same platform so I do not
want to slow down when it loads the quotes from the disk.

TI

botpro · Mar 11, 2016

I don't know ZeroMQ, so here a generic answer/suggestion:

Depends on the architecture of your modules: ie. are they within the same program, or are they external processes?
If they are internal, then just create a "DataServer" class which manages and accesses the data.
And the modules would call the methods of the DataServer...
Of course you would need to use one or more mutices to coordinate the shared access...
I've worked with more than 100 threads, it's no problem, but the shared acces must be
coordinated with mutices etc. Best is using recursive mutex (part of C++11), as it saves you much headache with deadlocks etc.

Ok, I looked up on the net, ZeroMQ is a messagepassing protocol, so it seems your modules are external.
Then the same principle applies as written above: you need to centralise, and by this, coordinate the shared access to the data,
because there is one writer and multiple readers...

But: for internal modules I wouldn't recommend to use such a message passing protocol because the overhead is too big (latency);
you can have it much cheaper, ie. much faster, by using multithreading...

vincegata · Mar 11, 2016

Hi botpro, my modules are processes on Linux so each one of them is an application on their own right.

I am currently using C++ STL multithreading. e.g I have a class that is a wrap of STL map with a mutex where I accumulate the quotes. Such architecture is hard to scale.

botpro · Mar 11, 2016

vincegata said:
Hi botpro, my modules are processes on Linux so each one of them is an application on their own right.

I am currently using C++ STL multithreading. e.g I have a class that is a wrap of STL map with a mutex where I accumulate the quotes. Such architecture is hard to scale.
More...

Then it would indeed make sense to use shared memory. But be aware that it too has some shortcomings, like that you can't store pointers there.
Best is to store an array of basic PODs there, excluding pointers.
And then you can do also the message passing over a shared memory region --> ie. would be much faster than your current or planned variant.

But there is also even a faster method in recent Linux versions: directly accessing or copying memory of other processes...
I would need to lookup in my archives/library for the name(s) of this latter method if it interests you.

vicirek · Mar 11, 2016

I think the system is too complex. ZeroMQ and multiple processes would make more sense if you plan to run some of them on different machines.

I run single process multithreaded system where modules communicate using circular buffers designed to work in this environment. One thread is reading it the other is writing to it. Each module starts its own thread so they work independently. I do not have even one mutex just buffer overload guards and pass mostly pointers around unless data size is small. My charting module is reading data only and it is accessing data continuously from a different thread with no problems. I just have to be careful not to use iterators that could potentially invalidate and if using vectors I have to use reserve() to avoid memory reallocations. The last problem does not occur when using lists for example. Again no mutex or any other synchronization is required. There is a central module in my app that could potentially send pointer to stored ticks to multiple modules instead of one and receive order instructions back and then forward it to module that is connecting to broker and feed.

Before designing it I also checked options in regards to interprocess communication and decided that it is not worth it partly because today's hardware is so powerful and most importantly having modules working in the same address space is a very attractive proposition. However, if I need to go with separate applications I just use sockets to do this job.

botpro · Mar 12, 2016

Here the mentioned functions for fast direct reading/writing memory between processes w/o using shared memory, provided by newer Linux kernels:

This article is good, but is outdated (the function names have changed since, IIRC):
https://lwn.net/Articles/405346/

These are the man pages of the final function names:
$ man process_vm_readv
$ man process_vm_writev

globalarbtrader · Mar 12, 2016

Interesting thread as I plan to use zeromq in my current refactoring (I currently use databases to pass prices etc which is fine for slow trading). Just to say I prefer multiple processes as it makes every thing more robust. GAT

vicirek said:
I think the system is too complex. ZeroMQ and multiple processes would make more sense if you plan to run some of them on different machines.

I run single process multithreaded system where modules communicate using circular buffers designed to work in this environment. One thread is reading it the other is writing to it. Each module starts its own thread so they work independently. I do not have even one mutex just buffer overload guards and pass mostly pointers around unless data size is small. My charting module is reading data only and it is accessing data continuously from a different thread with no problems. I just have to be careful not to use iterators that could potentially invalidate and if using vectors I have to use reserve() to avoid memory reallocations. The last problem does not occur when using lists for example. Again no mutex or any other synchronization is required. There is a central module in my app that could potentially send pointer to stored ticks to multiple modules instead of one and receive order instructions back and then forward it to module that is connecting to broker and feed.

Before designing it I also checked options in regards to interprocess communication and decided that it is not worth it partly because today's hardware is so powerful and most importantly having modules working in the same address space is a very attractive proposition. However, if I need to go with separate applications I just use sockets to do this job.
More...

2rosy · Mar 12, 2016

You dont need to share anything just pubsub quotes from aggregator to consumers. Do all your consumers need all the quotes?

nitro · Mar 12, 2016

vicirek said:
I think the system is too complex. ZeroMQ and multiple processes would make more sense if you plan to run some of them on different machines.
...
More...

zmq_ipc
zmq_inproc

https://www.google.com/webhp?sourceid=chrome-instant&rlz=1C1CHZL_enUS681US683&ion=1&espv=2&ie=UTF-8#q=zeromq intra process

The beauty of ZMQ is that in order to take a design from one machine to a cluster of machines takes one line of code change.

It is not the best way to go for HFT, but it works great for anything in the 100 to 500 micro-second range.

nitro · Mar 12, 2016

vincegata said:
Hi,

I've developed an ats using mutlithreading and FIFO. Though it's working, it's
very difficult to expand, e.g going from a single data access module to
multiple. Hence, I've decided to switch to ZeroMQ. I am using C++ on Linux and I
am processing every quote.

What I would like to achieve is to have one or more data access modules
(processes), data aggregator module, strategy module, OMS/Risk module, GUI
module, and order execution module all communicating using ZeroMQ.

I believe I've planned it all correctly as ZeroMQ stands, however I am not sure
what is the best way for the strategy module to access the accumulated quotes.
One way is not to have a data aggregator module but to accumulate quotes in the
strategy module but then I am back to multithreading. And, once I decide to have
multiple strategy modules I will be back to scalability issue.

So, my question is, how to access the accumulated data between the modules?

I am looking into POSIX shared memory, mapped memory, Boost.Interprocess,
Reddis. I want a solution that is easy to implement, reliable, and fast enough. I am
not after every microsecond but I run backtesting on the same platform so I do not
want to slow down when it loads the quotes from the disk.

TI
More...

I designed almost exactly this system about 5 years ago with extremely complex requirements that I don't want to go into here. I added an extra layer of abstraction using Google Protocol Buffers because I wanted to be language and OS agnostic.

ZMQ supports a guaranteed "multicast" delivery of Pub/Sub called PGM. That will solve most of your problems if you are not in need of being less than say 100 micros.