Btw, for the discriminating S&M practitioner: OpenFabrics Alliance. Oracle's take. EPAM's pitch. An academic paper. State machine replication for high availability using RDMA. Another pitch, this time from Arista networks. Yada, yada, yada... Code: Package: librdmacm-dev Version: 22.1-1 State: not installed Multi-Arch: same Priority: optional Section: libdevel Maintainer: Benjamin Drung <benjamin.drung@cloud.ionos.com> Architecture: amd64 Uncompressed Size: 325 k Depends: libibverbs-dev, librdmacm1 (= 22.1-1) Description: Development files for the librdmacm library librdmacm is a library that allows applications to set up reliable connected and unreliable datagram transfers when using RDMA adapters. It provides a transport-neutral interface in the sense that the same code can be used for both InfiniBand and iWARP adapters. The interface is based on sockets, but adapted for queue pair (QP) based semantics: communication must use a specific RDMA device, and data transfers are message-based. librdmacm only provides communication management (connection setup and tear-down) and works in conjunction with the verbs interface provided by libibverbs, which provides the interface used to actually transfer data. This package is needed to compile programs against librdmacm1. It contains the header files and static libraries (optionally) needed for compiling. Homepage: https://github.com/linux-rdma/rdma-core Tags: devel::library, role::devel-lib Code: Package: libibverbs-dev Version: 22.1-1 State: not installed Multi-Arch: same Priority: optional Section: libdevel Maintainer: Benjamin Drung <benjamin.drung@cloud.ionos.com> Architecture: amd64 Uncompressed Size: 1,397 k Depends: ibverbs-providers (= 22.1-1), libibverbs1 (= 22.1-1), libnl-3-dev, libnl-route-3-dev Description: Development files for the libibverbs library libibverbs is a library that allows userspace processes to use RDMA "verbs" as described in the InfiniBand Architecture Specification and the RDMA Protocol Verbs Specification. iWARP ethernet NICs support RDMA over hardware-offloaded TCP/IP, while InfiniBand is a high-throughput, low-latency networking technology. InfiniBand host channel adapters (HCAs) and iWARP NICs commonly support direct hardware access from userspace (kernel bypass), and libibverbs supports this when available. This package is needed to compile programs against libibverbs1. It contains the header files and static libraries (optionally) needed for compiling. Homepage: https://github.com/linux-rdma/rdma-core Tags: devel::library, role::devel-lib
Yes, it does make sense. Been there, done that. This is not unusual for HFTs to do such things. Some even try to have their fiber cables as straight and short as possible because it's faster when cables are that way. HFTs are working at the sub-microsecond level, and there are 1,000,000 microseconds in one second.
cutting edge techniques from ~2010. By the time big institutions are doing this it's old hat. https://access.redhat.com/sites/def...olarflare_openonload_performance_brief_10.pdf fpga, microwave networks, lasers (never took off) all old
Is this beyond (or independent of) the link speed? Why not simply upgrade from say current 1 to a new 2.5, 10, 25, 40, 50, or even 100+ GigaBit link instead of fiddling with bypassing the TCP/IP stack?
Bandwidth is not the speed, but the amount of data you can transfer. If you have enough bandwidth, increasing it will not make anything faster. And besides, HFT have colo infra usually. So they have direct links to the destination (exchanges, market data source) using fiber cross-connects.
I guess the question is what is the latency of the existing TCP/IP stack that comes with the OS including the network card driver etc. You read from an already connected TCP socket, what is latency since a byte of data first entered in the network card till if it is received in your app. It has to go through a few OS layers... I dont know the answer myself, not my area of expertise, maybe it is 100 nano seconds, maybe it is 100 microseconds.. Can this latency be noticeably reduced by direct access to the network card? I am guessing it is large enough to make a difference..
A lot of times it's just a matter of experience and perspective. At one point you just get used to doing things without the networking protocol (for example) and then you gain a few nano seconds and CPU cycles without thinking about it. Have you heard of "linguistic determinism"? In a nutshell, you abstract it and get used to it. I really like the RDMA interconnect for high availability clustering. Your apps can still use whatever they want/need if it makes sense. Spoiler alert: they got a 2x improvement.
This is common verbiage for saying that you bypass the kernel's TCP/IP stack by using a userspace driver instead. This does significantly improve latency, also noticeably increases the maximum packet rate, and is a common practice in this space. The performance limitations with the kernel network stack are well-documented. See for example: - https://dl.acm.org/doi/abs/10.1145/3297156.3297242 - https://www.cse.iitb.ac.in/~mythili/os/anno_slides/network_stack_kernel_bypass_slides.pdf - https://blog.cloudflare.com/kernel-bypass/ - https://access.redhat.com/sites/def...olarflare_openonload_performance_brief_10.pdf
But how is it used/applied in practice for HFT between a client/bot and the exchange/broker? Does the exchange/broker side need to use such bypass-technique too, or does it use just the standard TCP/IP stack and it suffices to use such user-space TCP/IP on the client side only? Or: is that intended only for own market making center, ie. for own internal/unofficial/non-public exchange or dark-pool? And: which US exchanges support/offer this technique? Any links for official/corporate info?