I m trying to use robust mutexes on linux to guard resources between processes and it seems that in some situations they do not behave in the "robust" way. By "robust" way i mean that pthread_mutex_lock should return EOWNERDEAD if the process owning the lock has terminated.
Here is the scenario where it doesn't work:
2 processes p1 and p2. p1 creates robust mutex and waits on it (after user's input). p2 has 2 threads: thread 1 maps into the mutex and acquires it. thread 2 (after thread 1 has acquired the mutex) also maps into the same mutex and waits on it (since thread 1 owns it now). Also note that p1 starts waiting on the mutex after p2-thread1 has already acquire it.
Now if we terminate p2, p1 never unblocks (meaning it's pthread_mutex_lock never returns) contrary to the supposed "robustness" where p1 should unblock with EOWNERDEAD error.
Here is the code:
p1.cpp:
#include <sys/types.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <pthread.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
struct MyMtx {
pthread_mutex_t m;
};
int main(int argc, char **argv)
{
int r;
pthread_mutexattr_t ma;
pthread_mutexattr_init(&ma);
pthread_mutexattr_setpshared(&ma, PTHREAD_PROCESS_SHARED);
pthread_mutexattr_setrobust_np(&ma, PTHREAD_MUTEX_ROBUST_NP);
int fd = shm_open("/test_mtx_p", O_RDWR|O_CREAT, 0666);
ftruncate(fd, sizeof(MyMtx));
MyMtx *m = (MyMtx *)mmap(NULL, sizeof(MyMtx),
PROT_READ | PROT_WRITE, MAP_SHARED,fd, 0);
//close (fd);
pthread_mutex_init(&m->m, &ma);
puts("Press Enter to lock mutex");
fgetc(stdin);
puts("locking...");
r = pthread_mutex_lock(&m->m);
printf("pthread_mutex_lock returned %d\n", r);
puts("Press Enter to unlock");
fgetc(stdin);
r = pthread_mutex_unlock(&m->m);
printf("pthread_mutex_unlock returned %d\n", r);
puts("Before pthread_mutex_destroy");
r = pthread_mutex_destroy(&m->m);
printf("After pthread_mutex_destroy, r=%d\n", r);
munmap(m, sizeof(MyMtx));
shm_unlink("/test_mtx_p");
return 0;
}
p2.cpp:
#include <sys/types.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <pthread.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
struct MyMtx {
pthread_mutex_t m;
};
static void *threadFunc(void *arg)
{
int fd = shm_open("/test_mtx_p", O_RDWR|O_CREAT, 0666);
ftruncate(fd, sizeof(MyMtx));
MyMtx *m = (MyMtx *)mmap(NULL, sizeof(MyMtx),
PROT_READ | PROT_WRITE, MAP_SHARED,fd, 0);
sleep(2); //to let the first thread lock the mutex
puts("Locking from another thread");
int r = 0;
r = pthread_mutex_lock(&m->m);
printf("locked from another thread r=%d\n", r);
}
int main(int argc, char **argv)
{
int r;
int fd = shm_open("/test_mtx_p", O_RDWR|O_CREAT, 0666);
ftruncate(fd, sizeof(MyMtx));
MyMtx *m = (MyMtx *)mmap(NULL, sizeof(MyMtx),
PROT_READ | PROT_WRITE, MAP_SHARED,fd, 0);
//close (fd);
pthread_t tid;
pthread_create(&tid, NULL, threadFunc, NULL);
puts("locking");
r = pthread_mutex_lock(&m->m);
printf("pthread_mutex_lock returned %d\n", r);
puts("Press Enter to terminate");
fgetc(stdin);
kill(getpid(), 9);
return 0;
}
First, run p1, then run p2 and wait until it prints "Locking from another thread". Press Enter on p1's shell to lock the mutex, then press Enter on p2's shell to terminate p2, or you can just kill it some other way. You will see that p1 prints "locking..." and pthread_mutex_lock never returns.
The problem actually doesn't happen all the time, looks like it depends on timing. If you let some time elapse after p1 starts locking and before terminating p2, sometime it works and p2's pthread_mutex_lock returns 130 (EOWNERDEAD). But if you terminate p2 right after or short time after p1 starts waiting on the mutex, p1 will never unblock.
Has anybody else ever encountered the same issue?
robustness defines the behavior when the owner of the mutex terminates without unlocking the mutex, usually because its process terminated abnormally. The value of robustness that is defined in pthread.
Mutexes are used to prevent multiple threads from causing a data race by accessing the same shared resource at the same time. Sometimes, when locking mutexes, multiple threads hold each other's lock, and the program consequently deadlocks.
Mutexes can be fair or unfair. A fair mutex lets threads through in the order they arrived. Fair mutexes avoid starving threads.
Just verified behaviour with glibc version: 2.11.1 on Linux Kernel 2.6.32 and newer.
My first finding: Iff you hit Enter in p1 before "Locking from another thread" in p2 (within 2s) the robust mutex works fine resp. as one would expect. Conclusion: The ordering of the waiting threads is important.
The first waiting thread gets woken up. Unfortunately it is the Thread within p2 which, at that time, gets killed.
See https://lkml.org/lkml/2013/9/27/338 for a description of the problem.
I don't know whether there are kernel fixes/patches around. Don't even known whether it is considered a bug at all.
Neverthless there seems a workaround for the whole mess. Use robust mutexes with PTHREAD_PRIO_INHERIT:
pthread_mutexattr_setprotocol(&ma, PTHREAD_PRIO_INHERIT);
Inside kernel (futex.c) instead of handle_futex_death() some other mechanism within exit_pi_state_list() does handle the wake up of other mutex waiters. It seems to solve the problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With