I am implementing custom server that needs to maintain very large number (100K or more) of long lived connections. Server simply passes messages between sockets and it doesn't do any serious data processing. Messages are small, but many of them are received/send every second. Reducing latency is one of the goals. I realize that using multiple cores won't improve performance and therefore I decided to run the server in a single thread by calling run_one
or poll
methods of io_service
object. Anyway multi-threaded server would be much harder to implement.
What are the possible bottlenecks? Syscalls, bandwidth, completion queue / event demultiplexing? I suspect that dispatching handlers may require locking (that is done internally by asio library). Is it possible to disable even queue locking (or any other locking) in boost.asio?
EDIT: related question. Does syscall performance improve with multiple threads? My feeling is that because syscalls are atomic/synchronized by the kernel adding more threads won't improve speed.
Using boost::asio you can write single-thread or multi-thread server approximately at same development cost. You can write single-threaded version as first version, then convert it to multithreaded, if needed.
If the run() method is called on an object of type boost::asio::io_service, the associated handlers are invoked on the same thread. By using multiple threads, an application can call multiple run() methods simultaneously.
The product I work on uses Boost ASIO (TCP) for network communication. During a test I noticed something very strange: ASIO would send at ~60MB/s for 13 seconds and then drop to ~300K/s for 9 seconds then it would then go back to ~60MB/s for 13 seconds; this would repeat for the entire transfer.
The project is sponsored by Automatak. The stack uses a completely asynchronous design. The core library is written in C++11 and has abstracted execution services, timers, and physical layers called the Platform Abstraction Layer (PAL).
You might want to read my question from a few years ago, I asked it when first investigating the scalability of Boost.Asio while developing the system software for the Blue Gene/Q supercomputer.
Scaling to 100k or more connections should not be a problem, though you will need to be aware of the obvious resource limitations such as the maximum number of open file descriptors. If you haven't read the seminal C10K paper, I suggest reading it.
After you have implemented your application using a single thread and a single io_service
, I suggest investigating a pool of threads invoking io_service::run()
, and only then investigate pinning an io_service
to a specific thread and/or cpu. There are multiple examples included in the Asio documentation for all three of these designs, and several questions on SO with more information. Be aware that as you introduce multiple threads invoking io_service::run()
you may need to implement strand
s to ensure the handlers have exclusive access to shared data structures.
Using boost::asio you can write single-thread or multi-thread server approximately at same development cost. You can write single-threaded version as first version, then convert it to multithreaded, if needed.
Typically, only bottleneck for boost::asio is that epoll/kqueue reactor is working in a mutex. So, only one thread is doing epoll at same time. This can decrease performance in case when you have multithreaded server, which serves lots and lots very small packets. But, imo it anyway should be faster than just plain-singlethread server.
Now about your task. If you want to just pass messages between connections - i think it must be multithreaded server. The problem is syscalls(recv/send etc). An instruction is very easy think to do for CPU, but any syscall is not very "light" operation (everything is relative, but relative to other jobs in your task). So, with single thread you will get big syscalls overhead, its why i recommend to use multithreaded scheme.
Also, you can separate io_service and make it work as "io_service per thread" idiom. I think this must give best performance, but it has drawback: if one of io_service will get too big queue - other threads will not help it, so some connections may slowdown. On other side, with single io_service - queue overrun can lead to big locking overhead. All you can do - do the both variants and measure bandwidth/latency. It should be not too difficult to implement both variants.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With