I have read in many places that there is some overhead associated with std::condition_variable_any
. Just wondering, what is this overhead?
My guess here is that since this is a generic condition variable that can work with any type of lock, it requires a manually rolled implementation of waiting (perhaps with another condition_variable and mutex or futex, or something similar) so the extra overhead probably comes from that? But not sure... As opposed to just being a native wrapper around pthread_cond_wait()
(and equivalent on other systems) etc.
As a followup, if I was say implementing something that waits on, say, a shared mutex, then is this type of condition variable a bad choice because of the performance overhead? What else can I do in this situation?
pthread_cond_wait()
/ SleepConditionVariableSRW()
, same as the the plain std::condition_variable::wait()
require just a single, atomic syscall for both releasing the mutex, waiting for the condition variable and re-aquiring the mutex. The thread immediately goes to sleep and another thread - ideally one which was blocked by the mutex - can take over immediately on the same core.
With std::condition_variable_any
, the unlock of the passed BasicLockable
and starting to wait on the native event / condition is more than just a single syscall, it's invoking the unlock()
method on the BasicLockable
first and only then issues the syscall for waiting. So you have at least the overhead from the separate unlock()
, plus you are more likely to trigger an less than ideal scheduling decision on the OS side. Worst case, the unlock even caused continuation of a waiting thread on a different core, with all the associated overhead.
The other way around, e.g. on spurious wakes, there are also OS side scheduling optimizations possible when dealing with a native mutex (as used in std::mutex
) which don't apply with a generic BasicLockable
.
Both involve some book keeping, in order to provide notify_all()
logic (it's actually one event / condition per waiting thread) as well as the guarantees about all methods being atomic, so they both come with a small overhead anyway.
The real overhead comes from how well the OS can make a good scheduling decision on the combined signal-and-wait-and-lock syscall. If the OS isn't smart about the scheduling, then it makes virtually no difference.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With