I came across this interesting paragraph in the Boost thread documentation today:
void wait(boost::unique_lock<boost::mutex>& lock)
...
Effects: Atomically call lock.unlock() and blocks the current thread. The thread will unblock when notified by a call to this->notify_one() or this->notify_all(), or spuriously. When the thread is unblocked (for whatever reason), the lock is reacquired by invoking lock.lock() before the call to wait returns. The lock is also reacquired by invoking lock.lock() if the function exits with an exception.
So what I am interested in is the meaning of the word "spuriously". Why would the thread be unblocked for spurious reasons? What can be done to resolve this?
This article by Anthony Williams is particularly detailed.
Spurious wakes cannot be predicted: they are essentially random from the user's point of view. However, they commonly occur when the thread library cannot reliably ensure that a waiting thread will not miss a notification. Since a missed notification would render the condition variable useless, the thread library wakes the thread from its wait rather than take the risk.
He also points out that you shouldn't use the timed_wait
overloads that take a duration, and you should generally use the versions that take a predicate
That's the beginner's bug, and one that's easily overcome with a simple rule: always check your predicate in a loop when waiting with a condition variable. The more insidious bug comes from timed_wait().
This article by Vladimir Prus is also interesting.
But why do we need the while loop, can't we write:
if (!something_happened)
c.wait(m);
We can't. And the killer reason is that 'wait' can return without any 'notify' call. That's called spurious wakeup and is explicitly allowed by POSIX. Essentially, return from 'wait' only indicates that the shared data might have changed, so that data must be evaluated again.
Okay, so why this is not fixed yet? The first reason is that nobody wants to fix it. Wrapping call to 'wait' in a loop is very desired for several other reasons. But those reasons require explanation, while spurious wakeup is a hammer that can be applied to any first year student without fail.
This blog post gives a reason for Linux, in terms of the futex
system call returning when a signal is delivered to a process. Unfortunately it doesn't explain anything else (and indeed is asking for more information).
The Wikipedia entry on spurious wakeups (which appear to be a posix-wide concept, btw, not limited to boost) may interest you too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With