We've been using asio in production for years now and recently we have reached a critical point when our servers become loaded just enough to notice a mysterious issue.
In our architecture, each separate entity that runs independently uses a personal strand
object. Some of the entities can perform a long work (reading from file, performing MySQL request, etc). Obviously, the work is performed within handlers wrapped with strand. All sounds nice and pretty and should work flawlessly, until we have begin to notice an impossible things like timers expiring seconds after they should, even though threads are 'waiting for work' and work being halt for no apparent reason. It looked like long work performed inside a strand had impact on other unrelated strands, not all of them, but most.
Countless hours were spent to pinpoint the issue. The track has led to the way strand
object is created: strand_service::construct
(here).
For some reason developers decided to have a limited number of strand
implementations. Meaning that some totally unrelated objects will share a single implementation and hence will be bottlenecked because of this.
In the standalone (non-boost) asio library similar approach is being used. But instead of shared implementations, each implementation is now independent but may share a mutex
object with other implementations (here).
What is it all about? I have never heard of limits on number of mutexes in the system. Or any overhead related to their creation/destruction. Though the last problem could be easily solved by recycling mutexes instead of destroying them.
I have a simplest test case to show how dramatic is a performance degradation:
#include <boost/asio.hpp>
#include <atomic>
#include <functional>
#include <iostream>
#include <thread>
std::atomic<bool> running{true};
std::atomic<int> counter{0};
struct Work
{
Work(boost::asio::io_service & io_service)
: _strand(io_service)
{ }
static void start_the_work(boost::asio::io_service & io_service)
{
std::shared_ptr<Work> _this(new Work(io_service));
_this->_strand.get_io_service().post(_this->_strand.wrap(std::bind(do_the_work, _this)));
}
static void do_the_work(std::shared_ptr<Work> _this)
{
counter.fetch_add(1, std::memory_order_relaxed);
if (running.load(std::memory_order_relaxed)) {
start_the_work(_this->_strand.get_io_service());
}
}
boost::asio::strand _strand;
};
struct BlockingWork
{
BlockingWork(boost::asio::io_service & io_service)
: _strand(io_service)
{ }
static void start_the_work(boost::asio::io_service & io_service)
{
std::shared_ptr<BlockingWork> _this(new BlockingWork(io_service));
_this->_strand.get_io_service().post(_this->_strand.wrap(std::bind(do_the_work, _this)));
}
static void do_the_work(std::shared_ptr<BlockingWork> _this)
{
sleep(5);
}
boost::asio::strand _strand;
};
int main(int argc, char ** argv)
{
boost::asio::io_service io_service;
std::unique_ptr<boost::asio::io_service::work> work{new boost::asio::io_service::work(io_service)};
for (std::size_t i = 0; i < 8; ++i) {
Work::start_the_work(io_service);
}
std::vector<std::thread> workers;
for (std::size_t i = 0; i < 8; ++i) {
workers.push_back(std::thread([&io_service] {
io_service.run();
}));
}
if (argc > 1) {
std::cout << "Spawning a blocking work" << std::endl;
workers.push_back(std::thread([&io_service] {
io_service.run();
}));
BlockingWork::start_the_work(io_service);
}
sleep(5);
running = false;
work.reset();
for (auto && worker : workers) {
worker.join();
}
std::cout << "Work performed:" << counter.load() << std::endl;
return 0;
}
Build it using this command:
g++ -o asio_strand_test_case -pthread -I/usr/include -std=c++11 asio_strand_test_case.cpp -lboost_system
Test run in a usual way:
time ./asio_strand_test_case
Work performed:6905372
real 0m5.027s
user 0m24.688s
sys 0m12.796s
Test run with a long blocking work:
time ./asio_strand_test_case 1
Spawning a blocking work
Work performed:770
real 0m5.031s
user 0m0.044s
sys 0m0.004s
Difference is dramatic. What happens is each new non-blocking work creates a new strand
object up until it shares the same implementation with strand
of the blocking work. When this happens it's a dead-end, until long work finishes.
Edit:
Reduced parallel work down to the number of working threads (from 1000
to 8
) and updated test run output. Did this because when both numbers are close the issue is more visible.
Well, an interesting issue and +1 for giving us a small example reproducing the exact issue.
The problem you are having 'as I understand' with the boost implementation is that, it by default instantiates only a limited number of strand_impl
, 193
as I see in my version of boost (1.59).
Now, what this means is that a large number of requests will be in contention as they would be waiting for the lock to be unlocked by the other handler (using the same instance of strand_impl
).
My guess for doing such a thing would be to disallow overloading the OS by creating lots and lots and lots of mutexes. That would be bad. The current implementation allows one to reuse the locks (and in a configurable way as we will see below)
In my setup:
MacBook-Pro:asio_test amuralid$ g++ -std=c++14 -O2 -o strand_issue strand_issue.cc -lboost_system -pthread MacBook-Pro:asio_test amuralid$ time ./strand_issue Work performed:489696 real 0m5.016s user 0m1.620s sys 0m4.069s MacBook-Pro:asio_test amuralid$ time ./strand_issue 1 Spawning a blocking work Work performed:188480 real 0m5.031s user 0m0.611s sys 0m1.495s
Now, there is a way to change this number of cached implementations by setting the Macro BOOST_ASIO_STRAND_IMPLEMENTATIONS
.
Below is the result I got after setting it to a value of 1024:
MacBook-Pro:asio_test amuralid$ g++ -std=c++14 -DBOOST_ASIO_STRAND_IMPLEMENTATIONS=1024 -o strand_issue strand_issue.cc -lboost_system -pthread MacBook-Pro:asio_test amuralid$ time ./strand_issue Work performed:450928 real 0m5.017s user 0m2.708s sys 0m3.902s MacBook-Pro:asio_test amuralid$ time ./strand_issue 1 Spawning a blocking work Work performed:458603 real 0m5.027s user 0m2.611s sys 0m3.902s
Almost the same for both cases! You might want to adjust the value of the macro as per your needs to keep the deviation small.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With