Server options

cjbuckley4 · Jun 17, 2015

I'm hoping someone here could offer me some guidance with regard to servers. Here is my problem:

I've written a C# TCP/IP Protocol application that receives both level I trade and quote data as well as depth of market data from IQFeed. This program takes these incoming messages, parses them into a struct, and writes them to a structured binary file. I've tested this program on my home machine with a 3.4ghz Intel ivy bridge processor and a cheap SSD. When watching between 10-20 EMinis, it consistently uses less than 7% of my processor, but it's not like I'm watching this for all possible market scenarios or during peak hours as I have an internship and school and whatnot. Consider 8% my best conservative guess. The program is not yet multithreaded, but I hope to learn about how to do this properly soon. Additionally, I hope to add a similar depth of market feed handler for Rithmic in the near future, but since I'm still hammering out the final details before I deploy this program, I see no reason to spend more on multiple feeds. This is more of a "case study" about how to do this properly at the moment.

The purpose of this program is to persist incoming depth of market and trade and quote data for research purposes. I am doing this because depth of market historical data is quite expensive, difficult to find, and impossible to know the quality of unless you record it yourself. Having spoken to many HFT folks (some on this forum whose time I greatly appreciate), they assure my that this is the way to go. To do this, I must deploy this program to a server. I am not interested in a colocation solution. The discussions here that are labelled "colocation" do not even pertain to real colocation. Call or email someone to get a quote and you'll see what I'm taking about. This will simply be a server solution perhaps *proximity hosted* at Cermak because I would like to avoid the public internet as much as possible. IQFeed doesn't disseminate from Chicago, so my reason for wanting to host there is mainly to avoid possible points of failure and make the transition to Rithmic or TT/CQG/CTS smoother when it happens. If hosting elsewhere is dramatically cheaper, I'm open to that as well. I do record the latency of every message coming in, but that's only as accurate as the windows system clock and whatever IQFeed does to normalize time, so I don't put a whole lot of stock into it...more of a heuristic. Additionally, my program and IQFeed are written in C# and must run on Windows Server 2008 or higher--I promise I like linux as much as anyone here, but I'm even less interested in discussing that than I am colocation, so let's just assume that I'm not flexible on my OS. My main concern is handling the feeds without missing a tick and keeping good uptime before latency.

My usage will be as follows: I will start by tracking only EMinis and some major commodity futures as well as exchange supported spreads. I will likely scale up to the full allowance of 500 symbols at some point. Additionally, once I add Rithmic, one could assume the demands would move even higher as I plan to track the full allowable depth of the book on at least 100 different contracts at a time, as well as my original IQFeed subscriptions.

So now that I've laid out what I have going on, here's my question: what kind of server accommodations do I need to pull this off in a scalable yet cost effective manner? Although the name doesn't inspire much confidence, this site has a variety of VPS, VDS, and dedicated server options that might be a launching point for this discussion, although few are in Cermak. If anyone here has any experience with this sort of stuff, please offer whatever info you can. I'm hoping that by writing to binary files incrementally and keeping the file size fairly small, I can keep this rather cheap, but I simply have no idea what is necessary other than what I've told you above about my experiences watching Eminis at home. Also, if anyone has any experience with logging server resource usage, I'd be interested in hearing about it. I know there's "Performance Monitor" in Windows Server, but as you can see, I'm just trying to get an idea of how all this stuff works. Finally, would it be better to store the IQFeed message structures in a queue and then write them to the binary file in a dedicated thread versus writing them directly as they come in?

i960 · Jun 17, 2015

This is pretty much nothing for a modern CPU to handle. You're literally just processing incoming socket data and serializing to binary. Even making it multithreaded is not going to result in a significant improvement because you're I/O bound on network. The network stack is also queuing while you're serializing so there's already some implicit concurrency going on that you're not directly in control of. A typical event loop with a reasonable select/poll() derivative (which widows has) is a decent approach here.

I know you don't want to hear about Linux but you *should* rewrite this in POSIX C using standard Berkeley sockets and then you'll have no issue running on any platform (including linux). Otherwise you're stuck using a Windows solution and C#.

Where you will probably start running into issues is with 500+ symbols and pure network I/O load.

cjbuckley4 · Jun 17, 2015

Thanks for your reassuring reply. You're correct about the sockets. I dispensed with their significantly easier COM API library (which I now understand to be just a wrapper around the sockets) because I believed this would be (foremost) a useful exercise and marginally faster.

My hands are pretty much tied with IQFeed as they don't support linux, else I would've gone that route if only because servers are cheaper. I'm a young CS major...not yet that experienced with real world problems like this, and frankly using WINE with sockets scares me. I am however open to/considering doing my Rithmic implementation on a separate linux server in C++ as my anecdotal research gives me the impression they're platform independent. In that case, I would use POSIX sockets if possible, but I don't even know how their feed works so I don't want to speculate as to whether they have a TCP/IP Protocol or what. Obviously the easy route is to just stick with one server and keep everything in C#, but I haven't really gotten to the Rithmic bridge yet. I'm taking a course on C/C++ in a linux environment that covers Berkeley sockets in fall so maybe I'll be better prepared to tackle that aspect of this project then.

If I was to look at 500+ symbols, what sort of hardware would you speculate would be necessary? I hear conflicting things between "it will run on your laptop" and "you need a direct line and a network card that costs more than your car," so I'm obviously a bit concerned. From what I've seen, I can get four dedicated cores of 3.0+ghz Xeon processors at a reasonable price, but that might be overkill...I really don't know.

I think my next move will be to simply build a list of any old active symbols between futures and stocks and add them to my watch list and just see what happens.

i960 · Jun 17, 2015

I think a quad core machine could handle 500 concurrent symbols without falling over but it'll be purely based on just how high the network bandwidth needs are. It won't be the latency that's a concern it'll be the ability to not oversaturate the line with that many symbols frequently receiving updates. You don't really need to care about latency if the DOM updates include a timestamp. This is raw bandwidth, reading bytes off a file descriptor, serializing into an intermediate, and writing to storage.

A straightforward POSIX C approach would be libevent with an event loop per core (say 4-8 threads total), non-blocking sockets, and a read callback that parses the incoming data which either writes an intermediate struct to a queue (processed by a separate writer thread) or which simply writes it directly. You'll want to use the tightest binary encoding possible for space IMO (eg don't use a long int for each level of depth use an unsigned short and/or use dynamic encoding with an unsigned char for flags per each record that indicates the max width for DOM levels for that record). To be portable structs shouldn't be written directly they should be packed into a temporary buffer in network byte order and written out to disk. You'll learn what I'm talking about when you take a class on Berkeley sockets most likely.

cjbuckley4 · Jun 17, 2015

Eliminating any of the latency concern was first and foremost to me, so made sure both feeds were millisecond or higher precision timestamped when I shopped. That leaves me with only the bandwidth problem you described and the realtime processing. Bandwidth is what it is, I either buy it and have it or I don't. As far as processing, I'm pretty much with you on most of what was discussed. Even if I didn't have my finger on the exact terms, they're consistent with my intuition on a lot of this. I'll need to get an understanding of multithreading and a better understanding of socket communication before I can really tackle this problem with any certainty that I'm doing it right. Thanks for your advice on design though. Things are gonna move at a fairly glacial pace due to high learning curve, but hopefully my the growth of my knowledge will outpace my demand for data by enough to make this whole question a nonissue.

i960 · Jun 18, 2015

To clarify the latency concern: If the packets you're receiving contain a server side timestamp (usually 8 bytes for epoch+nsec) then the latency simply doesn't matter. You could receive the packets a day later and would have the exact same accuracy and precision than if you received them 1msec ago. You're not measuring how quickly you can receive DOM updates you're simply storing time series data.

cjbuckley4 · Jun 18, 2015

Right, we're on the same page about latency. I specifically went for feeds which timestamp server side for this reason. I initially was interested in using TT API, but you must timestamp client side and I also found that all the FCMs who were willing to give me transactional pricing had TT's coalesced feed vs their noncoalesced feed.

hft_boy · Jun 19, 2015

cjbuckley4 said:
I'm hoping someone here could offer me some guidance with regard to servers. Here is my problem:

I've written a C# TCP/IP Protocol application that receives both level I trade and quote data as well as depth of market data from IQFeed.
.
.
.
My main concern is handling the feeds without missing a tick and keeping good uptime before latency.
.
.
.
Finally, would it be better to store the IQFeed message structures in a queue and then write them to the binary file in a dedicated thread versus writing them directly as they come in?
More...

It's unclear exactly what you are trying to accomplish here. Whether or not you can handle 500+ symbols on your laptop depends exactly on what you are doing and how fast you need it to be done, and no more. All the performance concerns just depend on your needs. If as you say you are only concerned about dropped packets then you should be fine! I assume IQFeed gives you some proprietary code which connects to their server via TCP and exposes an API to you. TCP handles all the dropped and out of order packets for you.

If millisecond or microsecond latency is important to you (rule of thumb: if you don't know, it probably isn't), then you would probably need a better feed and a much better understanding of networking down to how network cards serialize bits to the wire.

Logging to disk shouldn't be too much a performance hit. Main thing you should be aware of is to use buffered writes (which is conceptually similar to keeping a queue of structs but is more standard) because calling the write() syscall is expensive. On Unix you can achieve 'background'** writes by piping the output to an intermediate buffer (e.g. mbuffer, pv, buffer). Another thing you might want to consider is to use a very light compressor (lzop or lz4) in between the application and the disk. Counter-intuitively you can actually get higher throughput because the cost of using CPU cycles to do compression is offset by the gain in disk bandwidth usage.

**Occuring in the 'background' is a bit misleading here. Technically writes are performed to the pipe, typically implemented in the kernel as a ring buffer. Some memory copies are therefore being performed but the overhead is much lower and more deterministic than hitting the disk. If you were really enterprising you could avoid the overhead; manage the buffers in the application and use non-blocking writes, or multiple threads with some machinery for synchronization like some sort of ring buffer, or abuse the garbage collector by maintaining some sort of linked list, or get more creative. Sky's the limit really.

cjbuckley4 · Jun 19, 2015

Thanks for the input @hft_boy. Yes, at present, latency is not my concern. I won't be doing any live trading, merely collecting orderbook data for research purposes. I will worry about what latency I need to achieve when I have an idea of what my strategies will require...horse before the cart. I don't think it would be wise for someone in my position to play low latency games for a while, but my feeling is that orderbook data probably contains relevant information outside of the next few milliseconds, and I intend to see what that info may be.

In reading more about this problem, I've found that most feeds build a buffer of messages and you can process them as you receive them. I assume this may be inherit to TCP connections as you mentioned. The only time you will actually lose data is when your backlog runs out of their buffer, so my concerns about processing power and bandwidth were assuaged greatly by that. I have also decided that it may be best to skip parsing the IQFeed data into a struct all together and simply add the data as it comes off the sockets to the queue and then write to binary files in a separate thread. This way I can do away with an entire (not very intensive) step. I will just parse the binary files on my home machine into the teafiles for my use later. I'm starting to believe that the hardware requirements here won't be as intense as I previously imagined. Forgive me if my understanding is off here, but even if I do get the full 1 Gbs data that you can fit through a standard port, I could store it all in a queue and write it to disk at about 200 mbs, so as long as I don't see sustained bursts of multiple seconds like that, it will be a total nonissue. Furthermore, I clearly have no idea about networking, but I highly doubt IQFeed is sending 1Gbs over the public internet at any given time to all their subscribers. I'm a noob here, so tell me if what I'm ballparking here is remotely accurate.

Thanks for your helpful feedback.

EDIT: I will also research the buffered writes you alluded to instead of/or complimenting using a queue.

hft_boy · Jun 20, 2015

Good to see you're tackling these problems head on.

Yes, unlikely that you're getting 1Gbps over the internet or even your ISP is sending you more than 100mbps. If you are concerned about bandwidth, you can get cheap SSDs these days which do 400-500MB/s sustained sequential writes.

Cheers and feel free to PM me if you have specific questions.