Can Boost ASIO be used to build low-latency applications, such as HFT (High Frequency Trading)?
So Boost.ASIO uses platform-specific optimal demultiplexing mechanism: IOCP, epoll, kqueue, poll_set, /dev/poll
Also can be used Ethernet-Adapter with TOE (TCP/IP offload engine) and OpenOnload (kernel-bypass BSD sockets).
But can Low-latency application be built by using Boost.ASIO + TOE + OpenOnload?
Boost. Asio is a cross-platform C++ library for network and low-level I/O programming that provides developers with a consistent asynchronous model using a modern C++ approach. Overview. An overview of the features included in Boost.
For me, main advantage of Boost. Asio (besides cross-platform work) is, that on each platform, it uses most effective strategy ( epoll on Linux 2.6, kqueue on FreeBSD/MacOSX, Overlapped IO on MS Windows).
This is the advice from the Asio author, posted to the public SG-14 Google Group (which unfortunately is having issues, and they have moved to another mailing list system):
I do work on ultra low latency financial markets systems. Like many in the industry, I am unable to divulge project specifics. However, I will attempt to answer your question.
In general:
At the lowest latencies you will find hardware based solutions.
Then: Vendor-specific kernel bypass APIs. For example where you encode and decode frames, or use a (partial) TCP/IP stack implementation that does not follow the BSD socket API model.
And then: Vendor-supplied drop-in (i.e. LD_PRELOAD) kernel bypass libraries, which re-implement the BSD socket API in a way that is transparent to the application.
Asio works very well with drop-in kernel bypass libraries. Using these, Asio-based applications can implement standard financial markets protocols, handle multiple concurrent connections, and expect median 1/2 round trip latencies of ~2 usec, low jitter and high message rates.
My advice to those using Asio for low latency work can be summarised as: "Spin, pin, and drop-in".
Spin: Don't sleep. Don't context switch. Use io_service::poll() instead of io_service::run(). Prefer single-threaded scheduling. Disable locking and thread support. Disable power management. Disable C-states. Disable interrupt coalescing.
Pin: Assign CPU affinity. Assign interrupt affinity. Assign memory to NUMA nodes. Consider the physical location of NICs. Isolate cores from general OS use. Use a system with a single physical CPU.
Drop-in: Choose NIC vendors based on performance and availability of drop-in kernel bypass libraries. Use the kernel bypass library.
This advice is decoupled from the specific protocol implementation being used. Thus, as a Beast user you could apply these techniques right now, and if you did you would have an HTTP implementation with ~10 usec latency (N.B. number plucked from air, no actual benchmarking performed). Of course, a specific protocol implementation should still pay attention to things that may affect latency, such as encoding and decoding efficiency, memory allocations, and so on.
As far as the low latency space is concerned, the main things missing from Asio and the Networking TS are:
Batching datagram syscalls (i.e. sendmmsg, recvmmsg).
Certain socket options.
These are not included because they are (at present) OS-specific and not part of POSIX. However, Asio and the Networking TS do provide an escape hatch, in the form of the native_*() functions and the "extensible" type requirements.
Cheers, Chris
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With