Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to increase throughput of Boost ASIO, UDP client application

I am using the Boost ASIO library to implement a Windows UDP client which needs to be capable of high throughput.

I would like to use asynchronous receive calls so that I can eventually implement a receive timeout, ie. after a certain amount of time if no datagrams have been received my application will exit.

My problem is that I see 30% higher data throughput using synchronous receives vs. asynchronous receives. I have observed this issue when running the application on multiple Dell R630, R710 Windows 2008 Servers, and even my Lenovo ThinkPad laptop.

What are the main performance differences between the two code segments below? Is there more overhead in calling ioService.run_one() after each async receive? I am a new user of the Boost library, so any help would be much appreciated!

Synchronous Receive:

socket_->receive_from(boost::asio::buffer(&vector_[0], datagramSize),  
                      endPoint_);

vs.

Asynchronous Receive (with blocking):

err = boost::asio::error::would_block;

socket_->async_receive_from(
    boost::asio::mutable_buffers_1(&vector_[0], datagramSize),
    endPoint_,
    boost::bind(&HandleRead, _1, _2, &err, &bytesReceived));

do
{
    ioService_.run_one()
}
while(err == boost::asio::error::would_block)

Asynchronous Receive Handler Function:

static void HandleRead
(
    const boost::system::error_code& error, 
    std::size_t bytesRead,
    boost::system::error_code* outError, 
    std::size_t* outBytesRead
)
{
    *outError = error;
    *outBytesRead = bytesRead;
}
like image 552
jbtechie Avatar asked Jun 30 '15 01:06

jbtechie


1 Answers

It shouldn't come as a surprise that the async_ family of API functions has as most important property that they run asynchronously.

Running anything asynchronously is not - by itself - going to make it faster. In fact, due to scheduling artefacts it might be slower.

The thing is that asynchrony can allow you to do many more things on a small number of threads (e.g. the main thread).

It sounds a little bit as if your application doesn't require that multiplexing kind of operation. If your application indeed consumes a single source of packets as fast as possible in a linear fashion, than indeed it makes no sense to

  • interpose a (thread safe) task queue
  • ask io_service to schedule the tasks across the available service threads¹ (you have only one)
  • coordinate the results back in the form of callbacks; The callbacks frequently lead to object lifetime hacks, which in turn frequently lead to shared_ptr<>s. If so, these are all sources of more delays (due to reduced locality of reference, more dynamic allocations etc.).

Don't use asynchronous mode if you don't need it.

Even if you have a limited number of essentially single-threaded, sequential running tasks, you can probably achieve the most by having a thread for each, a io_service per thread and avoid the coordination.

¹ threads running io_service::run or similar

like image 104
sehe Avatar answered Nov 14 '22 23:11

sehe