I am implementing custom server that needs to maintain very large number (100K or more) of long lived connections. Server simply passes messages between sockets and it doesn't do any serious data processing. Messages are small, but many of them are received/send every second. Reducing latency is one of the goals. I realize that using multiple cores won't improve performance and therefore I decided to run the server in a single thread by calling <code>run_one</code> or <code>poll</code> methods of <code>io_service</code> object. Anyway multi-threaded server would be much harder to implement. What are the possible bottlenecks? Syscalls, bandwidth, completion queue / event demultiplexing? I suspect that dispatching handlers may require locking (that is done internally by asio library). Is it possible to disable even queue locking (or any other locking) in boost.asio? EDIT: related question. Does syscall performance improve with multiple threads? My feeling is that because syscalls are atomic/synchronized by the kernel adding more threads won't improve speed.

You might want to read my question from a few years ago, I asked it when first investigating the scalability of Boost.Asio while developing the system software for the Blue Gene/Q supercomputer. Scaling to 100k or more connections should not be a problem, though you will need to be aware of the obvious resource limitations such as the maximum number of open file descriptors. If you haven't read the seminal C10K paper, I suggest reading it. After you have implemented your application using a single thread and a single <code>io_service</code>, I suggest investigating a pool of threads invoking <code>io_service::run()</code>, and only then investigate pinning an <code>io_service</code> to a specific thread and/or cpu. There are multiple examples included in the Asio documentation for all three of these designs, and several questions on SO with more information. Be aware that as you introduce multiple threads invoking <code>io_service::run()</code> you may need to implement <code>strand</code>s to ensure the handlers have exclusive access to shared data structures.

Boost Asio single threaded performance

Tags:

c++

linux

boost

boost-asio

epoll

I am implementing custom server that needs to maintain very large number (100K or more) of long lived connections. Server simply passes messages between sockets and it doesn't do any serious data processing. Messages are small, but many of them are received/send every second. Reducing latency is one of the goals. I realize that using multiple cores won't improve performance and therefore I decided to run the server in a single thread by calling run_one or poll methods of io_service object. Anyway multi-threaded server would be much harder to implement.

What are the possible bottlenecks? Syscalls, bandwidth, completion queue / event demultiplexing? I suspect that dispatching handlers may require locking (that is done internally by asio library). Is it possible to disable even queue locking (or any other locking) in boost.asio?

EDIT: related question. Does syscall performance improve with multiple threads? My feeling is that because syscalls are atomic/synchronized by the kernel adding more threads won't improve speed.

570

asked Feb 25 '13 18:02

pic11

2 Answers

You might want to read my question from a few years ago, I asked it when first investigating the scalability of Boost.Asio while developing the system software for the Blue Gene/Q supercomputer.

Scaling to 100k or more connections should not be a problem, though you will need to be aware of the obvious resource limitations such as the maximum number of open file descriptors. If you haven't read the seminal C10K paper, I suggest reading it.

After you have implemented your application using a single thread and a single io_service, I suggest investigating a pool of threads invoking io_service::run(), and only then investigate pinning an io_service to a specific thread and/or cpu. There are multiple examples included in the Asio documentation for all three of these designs, and several questions on SO with more information. Be aware that as you introduce multiple threads invoking io_service::run() you may need to implement strands to ensure the handlers have exclusive access to shared data structures.

186

answered Nov 09 '22 06:11

Sam Miller

Typically, only bottleneck for boost::asio is that epoll/kqueue reactor is working in a mutex. So, only one thread is doing epoll at same time. This can decrease performance in case when you have multithreaded server, which serves lots and lots very small packets. But, imo it anyway should be faster than just plain-singlethread server.

Now about your task. If you want to just pass messages between connections - i think it must be multithreaded server. The problem is syscalls(recv/send etc). An instruction is very easy think to do for CPU, but any syscall is not very "light" operation (everything is relative, but relative to other jobs in your task). So, with single thread you will get big syscalls overhead, its why i recommend to use multithreaded scheme.

Also, you can separate io_service and make it work as "io_service per thread" idiom. I think this must give best performance, but it has drawback: if one of io_service will get too big queue - other threads will not help it, so some connections may slowdown. On other side, with single io_service - queue overrun can lead to big locking overhead. All you can do - do the both variants and measure bandwidth/latency. It should be not too difficult to implement both variants.

answered Nov 09 '22 04:11

PSIAlt

Related questions
                            
                                Determining if a string is a double
                            
                                C++ header-only template library
                            
                                How to instantiate a static vector of object?
                            
                                Adding two unsigned char variables and result is int
                            
                                C++ Make a file of a specific size
                            
                                64-bit Unix timestamp conversion
                            
                                Class scope typedef bad practice?
                            
                                Count 'white' pixels in opencv binary image (efficiently)
                            
                                Template class methods definition with enable_if as template parameter
                            
                                syntax error : missing ';' before identifier 'PVOID64' when compiling winnt.h
                            
                                std::make_shared number of parameters in the constructor
                            
                                Why there is no concept of "const-correctness" for class's static member functions?
                            
                                Callling object constructor/destructor with a custom allocator
                            
                                running c++ code from python
                            
                                How to jump out of a C++ code block?
                            
                                sizeof char* array in C/C++
                            
                                How do I get crtdbg.h file?
                            
                                Is it safe to check if a pointer is null, then dereference it in the same if statement?
                            
                                How to change the buffer limit in Google's protobuf?
                            
                                Lambda assigning local variables

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With