I am currently working with condition variables to synchronize two threads (pthreads) and am getting an unexpected behaviour where, even though I have verified a thread is already waiting on a condition, it does not wake when another thread signals on the condition.
It may be worth noting that I have ran this on a desktop environment, and it runs as expected, but this issue arises when I ran the program in an embedded environment using uclibc.
To troubleshoot, I stripped down my code to just the two threads performing lock/unlocking/signalling, which is listed below:
#include <stdio.h>
#include <pthread.h>
#include <stdbool.h>
pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t mutex2 = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t condition1 = PTHREAD_COND_INITIALIZER;
pthread_cond_t condition2 = PTHREAD_COND_INITIALIZER;
bool predicate1 = false;
bool predicate2 = false;
static void * ThreadFunc2(void * arg) {
sleep(1); // For testing purposes, ensures this thread is run after Thread1
pthread_mutex_lock(&mutex2);
while(1) {
pthread_mutex_lock(&mutex1);
// Do some work - Eg receive some data from a socket
predicate1 = false;
pthread_cond_signal(&condition1);
pthread_mutex_unlock(&mutex1);
predicate2 = true;
while(predicate2 == true)
pthread_cond_wait(&condition2, &mutex2);
// Do some more work - Eg send response data to socket
}
}
static void * ThreadFunc1(void * arg) {
int result;
pthread_mutex_lock(&mutex1);
while(1) {
predicate1 = true;
while(predicate1 == true)
pthread_cond_wait(&condition1, &mutex1);
// Do some work - Eg process data on the socket and prepare response data to be sent
pthread_mutex_lock(&mutex2);
predicate2 = false;
pthread_cond_signal(&condition2);
pthread_mutex_unlock(&mutex2);
}
}
int main(int argc, char * argv[]) {
pthread_t thread1Id, thread2Id;
pthread_create(&thread1Id, NULL, ThreadFunc1, NULL);
pthread_create(&thread2Id, NULL, ThreadFunc2, NULL);
while(1) {
sleep(1);
}
return 0;
}
If I exclude all statements relating to mutex2/condition2/predicate2, the two threads work together as expected.
With the code as listed above, after a short time (since all work has been stripped out, each loop runs very quickly) the wait on condition1 in ThreadFunc1 does not wake even though it is signalled by Threadfunc2 leading to the application being halted.
Also to help me debug, I had redefined the pthread_* functions to print a message to stdout with the matching line numbers prior to calling the actual pthread_* functions. This allowed me to follow the flow of each pthread operation, and verify that the signal was being sent to an already waiting condition.
Can anyone please help me shed some light on any potential issue(s) that may be present from my implementation above?
Thanks in advance for any suggestions.
The pthread_cond_wait() function atomically unlocks mutex and performs the wait for the condition. In this case, atomically means with respect to the mutex and the condition variable and another threads access to those objects through the pthread condition variable interfaces.
The pthread_cond_wait() function blocks the calling thread, waiting for the condition specified by cond to be signaled or broadcast to. When pthread_cond_wait() is called, the calling thread must have mutex locked.
It's called spurious because the thread has seemingly been awakened for no reason. But spurious wakeups don't happen for no reason: they usually happen because, in between the time when the condition variable was signaled and when the waiting thread finally ran, another thread ran and changed the condition.
First, if there are threads waiting on the signaled condition variable, the monitor will allow one of the waiting threads to resume its execution and give this thread the monitor lock back. Second, if there is no waiting thread on the signaled condition variable, this signal is lost as if it never occurs.
Your mistake is that you do not unlock the mutex used by the condition variable after the calls to pthread_cond_wait().
e.g pthread_cond_wait() unlocks the mutex internally while the thread is blocked but it re-acquires the lock when it wakes up and you need to explicitly release it.
See this tutorial for more details on cond. variables: https://computing.llnl.gov/tutorials/pthreads/#ConditionVariables
I experienced similar problems. In my case, sometimes the signal was sent before the blocked thread was waiting. The behavior in such case was that both threads were "stuck". We solved it by adding a flag notifying a signal was sent.
Solution - see explanation below
Putting pthread_mutex_unlock() before the signaling call pthread_cond_signal() instead of after it should solve the issue
...
pthread_mutex_lock(&mutex1);
predicate1 = false;
pthread_mutex_unlock(&mutex1);
pthread_cond_signal(&condition1);
...
in function ThreadFunc2 and similarly for thread 1
...
pthread_mutex_lock(&mutex2);
predicate2 = true;
pthread_mutex_unlock(&mutex2);
pthread_cond_signal(&condition2);
...
in function ThreadFunc1.
Explanation In your program thread 2 comes into the signalling call
pthread_cond_signal(&condition1); // thread 2 with mutex1 locked
with the mutex1 locked. Thread 1 can only leave the blocking
pthread_cond_wait(&mutex1); // thread 1 leaves only after mutex1 unlocked
call with itself locking mutex1 which is the guaranteed behavior of this function call - that means that it should be unlocked by all other threads in order to continue. If you have an implementation of pthread_cond_signal() that blocks until the thread that receives the signal continues then a dead-lock results when it enters the call with the corresponding mutex locked. That could also explains why one environment could simply work fine while the other doesn't: for instance when your desktop environment doesn't have a blocking call to pthread_cond_signal() while your embedded environment does.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With