I am working on designing a websocket server which receives a message and saves it to an embedded database. For reading the messages I am using boost asio. To save the messages to the embedded database I see a few options in front of me:
I am pretty sure the second answer is what I want. However, I am not sure how to pass messages from the socket thread to the IO thread. I see the following options:
Also, I believe it would help to have multiple IO threads which would receive messages in a round robin fashion to save to the embedded database.
Which architecture would be the most performant? Are there any other alternatives from the ones I mentioned?
A few things to note:
I don't think you want to use a unix socket, which is always going to require a system call and pass data through the kernel. That is generally more suitable as an inter-process mechanism than an inter-thread mechanism.
Unless your database API requires that all calls be made from the same thread (which I doubt) you don't have to use a separate boost::asio::io_service
for it. I would instead create an io_service::strand
on your existing io_service
instance and use the strand::dispatch()
member function (instead of io_service::post()
) for any blocking database tasks. Using a strand
in this manner guarantees that at most one thread may be blocked accessing the database, leaving all the other threads in your io_service
instance available to service non-database tasks.
Why might this be better than using a separate io_service
instance? One advantage is that having a single instance with one set of threads is slightly simpler to code and maintain. Another minor advantage is that using strand::dispatch()
will execute in the current thread if it can (i.e. if no task is already running in the strand
), which may avoid a context switch.
For the ultimate optimization I would agree that using a specialized queue whose enqueue operation cannot make a system call could be fastest. But given that you have network i/o by producers and disk i/o by consumers, I don't see how the implementation of the queue is going to be your bottleneck.
After benchmarking/profiling I found the facebook folly implementation of MPMC Queue to be the fastest by at least a 50% margin. If I use the non-blocking write method, then the socket thread has almost no overhead and the IO threads remain busy. The number of system calls are also much less than other queue implementations.
The SPSC queue with cond variable in boost is slower. I am not sure why that is. It might have something to do with the adaptive spin that folly queue uses.
Also, message passing (UDP domain sockets in this case) turned out to be orders of magnitude slower especially for larger messages. This might have something to do with copying of data twice.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With