Server options

Discussion in 'Automated Trading' started by cjbuckley4, Jun 17, 2015.

  1. i960

    i960

    write(2) will still be relatively buffered unless one explicitly opens the FD it uses with O_DIRECT. The main performance hit will be in the context switch from crossing the userland<>kernel barrier in calling a syscall. None of that is needed though as typical stdio.h routines are all buffered (fwrite, fprintf, etc). I honestly don't think the disk i/o will even be a concern here as the data will have been reduced from the network side into smaller units. Even when the stdio routines (and write() for that matter) flush buffers there's still the filesystem cache which will have it's own buffering. In short the writing out of data will hit multiple buffers and be done efficiently by the OS for the most part.

    Any reasonably modern unix kernel will buffer pages to be written to backing store via a page cache and the kernel will handle that on it's own (Linux users can see metrics on this in /proc/meminfo). Where it would be an issue is if the buffers are not able to be flushed faster than the caller is writing to them. I doubt that'll be the case. WRT to non-blocking file I/O, either threads or async IO (aio.h) for files as files specifically are already non-blocking in nature.

    Best bet here is to write a coarse prototype, get it working, make it correct, then profile it to see where the actual bottlenecks are. My bet is it'll be entirely I/O bound on the network.
     
    #11     Jun 20, 2015
  2. i960

    i960

    Don't do this. For one, you're already pre-optimizing here. Secondly, that intermediate struct is your friend. Serialization from the read buffers into an intermediate struct abstracts the data into an atomic unit you can pass around at very little cost. Writing it direct to disk via a separate thread would already require a struct or class to be placed into a thread-aware queue anyway, and if you just pass it a buffer of temporary bytes or even the FD itself you're not really saving any time or resources here you're just shifting them around. Remember, the stack is queuing everything being received by the network driver into a receive queue (netstat -an on a Linux host and you'll see it [recv-q/send-q]). The size remaining in this queue is communicated to the sender via TCP windowing to provide flow control, you don't need to worry about it unless you're filling up queues consistently.

    If you instead don't use a separate thread and also don't use a separate struct but just write it right to storage from the function processing the input data off the stack you now have a function that's basically highly coupled to network input and storage output and that's not good design nor is it really saving *that* much in the grand scheme of things.

    Write it straightforward. Since you haven't worked directly with Berkeley style sockets and non-blocking I/O I can *guarantee you* that the initial 90% of your time is going to be spent trying to figure out how to even do it correctly. It's not rocket science, but it's not hello world stuff. There are multiple avenues which you will hang yourself while learning it and that's going to take vast the majority of initial time to learn.

    You will not see line rate from an off the shelf network card. You may see 600-800Mbps at best, but you'll never see flat out 1Gbps (nor will you even be able to receive that without direct switch port connectivity). Not gonna happen. On top of that, the amount of data written over the wire (actual bandwidth required) is not the same amount of bandwidth which will be needed for storage. You're going to be taking data that is of lower density and higher frequency and distilling it down to a more efficient final form. In fact what benefits you highly here is if the provider is able to give you raw binary data and a protocol API of some sort. This reduces the amount of assembly/reassembly and bandwidth needed on both sides . It is more proprietary though so finding a "retail" provider who will do this might be more difficult.

    Also, might consider talking to the Nanex people about this - as I'm sure they have direct experience on this type of thing (NxCore?). On top of that, consider this as well: http://www.cmegroup.com/market-data/distributor/market-data-platform.html
     
    Last edited: Jun 20, 2015
    #12     Jun 20, 2015
  3. cjbuckley4

    cjbuckley4

    Thanks for the link from CME. I obviously am not watching the full depth nor every future in each subcategory, but it still helps me establish expectations.

    With regard to parsing to a struct, you may be right, first let me make sure we're clear here. I'm using System.Net.Sockets to receive the byte[], I could write that directly to a plain unstructured binary file with BinaryWriter. Alternatively, I could then use the ASCIIEncoder.getString(byte[] message, int 0, int bytesRead) method to convert the incoming message into a string which I would then parse into a struct and use the method here under .Net to write it a teafile structured binary file. My concern was not actually whether parsing into a struct would be too computational intensive, it was whether the teafile.Write() would be able to achieve the same IO as BinaryWriter since I could find no info on it, but plenty on the performance of BinaryWriter. I realize that Teafile.Write() is probably just a thin wrapper around the underlying C# or even inherently the C mechanisms for writing to a file, but I was still unsure so I wanted to play it safe. Parsing server side is undoubtably easier since I already wrote all the code and--as I start to understand the buffering of sockets above and the max possible network IO vs what sort of write performance is easily achievable--I've become much less concerned about where I parse it and how much horsepower I actually need. I've also found some server providers who will let me scale up without much trouble and quit with no commitment so I can start small and take baby steps toward more horsepower as needed. I spoke to Rithmic, and their feed doesn't have any parsing involved, which to me says that the data arrives in a structured format already. Based on that, it might be easier to keep everything I get via FTP from the server normalized to one format anyway, so using some kind of structure in my IQFeed reader is gonna be necessary if I go that route.

    With regard to NxCore, I've received several recommendations to go that route in the past, but have been hesitant for a number of reasons. I believe Rithmic is the best feed available to retail traders for futures data that arrives timestamped, and because I plan to use IQFeed to ultimately watch 500-1000 backup orderbooks on the futures I watch, futures options, futures Rithmic doesn't cover, ETPs with futures as their basis, and the equity Principle Components of a few index futures, I see no reason to pay more for the NxCore feed which timestamps at a lower granularity than IQFeed. My opinion may change there as I decide to watch more instruments and my pockets get deeper (remember we have a college student here). Thanks for your excellent advice!
     
    #13     Jun 20, 2015
  4. hft_boy

    hft_boy

    Good point about the buffer cache. Totally agree, it is best to implement first -- and then if the disk [slash insert X here] is a bottleneck go crazy trying to optimize.
     
    #14     Jun 20, 2015
    cjbuckley4 likes this.
  5. You are concerning yourself with the wrong issues. There are no hardware nor network bandwidth issues you need to concern yourself with. Instead any potential bottleneck is to be avoided on the software side. Any standard .Net TCP client does fine for your purposes. The main issue that I would focus on is how to store data in memory and infrequently persist to disk. I have not yet read other posts than just this first one of yours.

     
    #15     Jun 21, 2015
  6. why should he re-write anything in another language? C# is perfectly capable of handling this. He can write to memory, while another thread infrequently makes a copy of a certain chunk of data, serializes them and writes them to disk. A circular buffer comes to mind but there are tons of other simple solutions. Why making life complicated when it can be easy? (I will refer to .NET TPL Dataflow in later comments which is the perfect solution for such scenario).

     
    Last edited: Jun 21, 2015
    #16     Jun 21, 2015
  7. Don't waste time re-inventing the wheel when you already have something in C#. Have you looked at TPL Dataflow? It would be the perfect approach for your situation. It is probably as performant as any C++ solution, given you are not latency dependent. You can post your incoming socket data to a dataflow component and can do all deserialization/serialization work there, you can with one simple command specify whether you want to do all of that on multiple threads at the same time or not, you can fan out data, merge data, or simply perform operations on the data and move the results to the next data block. You should really take a look at tpl dataflow because this is exactly what this technology aims to solve.

     
    Last edited: Jun 21, 2015
    #17     Jun 21, 2015
    cjbuckley4 likes this.
  8. exactly, no need to talk about latency here.

     
    #18     Jun 21, 2015
  9. cjbuckley4

    cjbuckley4

    Thanks @volpunter , you're correct about where I should be concerning myself I think. No need to read it all really. The rest of the thread is pretty much me coming to that realization with the help of others on here. Although IQFeed doesn't send any data that I'm aware of/interested in over the weekend, I got a cheap starter Virtual "Dedicated" Server up and running and everything seems to be working as planned. The real testing will start in a few hours when things heat up and I get my first real load on the system server side. I'm not even 100% sure what my full watch list will include at the end of the day, but I have some ES and GE contracts on there that I'm sure will give me a good indication of how I'll do with more active contracts. As you can see from reading through though, I'm really not worried at all at this point. I think I'll probably be able to handle everything IQFeed throws at me as well as Rithmic most likely on an 80-120 dollar a month virtual "dedicated" server. The only concern I have with the VDS approach is that I don't know how exactly the writing to hard disk will work in this kind of environment. I will move it to a dedicated server if it becomes an issue though.

    EDIT: just saw your last post about TPL. I have not looked into it, but it does sound like a possibility. I'll look into it. Thanks for the suggestion.
     
    Last edited: Jun 21, 2015
    #19     Jun 21, 2015
  10. agree, its very important to deserialize the data and create your own data structure (struct or class objects) which you then later serialize and persist to disk.

    I can show how this whole solution is done with C# TPL Dataflow in less than 200 lines of code and without the slightest bottleneck on the implementation side. It sounds like you (i960) make it sound a lot harder than it really is. With TPL DF you can simply have the socket write incoming data asyc to the first dataflow component and the rest is down right away on different threads. You would never even have to start to concern yourself with blocking I/O. With dataflow blocks in C# I am absolutely sure that a stock i5 machine with 8gb of memory and ssd drive will be fully sufficient to handle any traffic up to the limitations of the network card capacity.

     
    #20     Jun 21, 2015
    jtrader33 likes this.