I m trying to use robust mutexes on linux to guard resources between processes and it seems that in some situations they do not behave in the "robust" way. By "robust" way i mean that pthread_mutex_lock should return EOWNERDEAD if the process owning the lock has terminated. Here is the scenario where it doesn't work: 2 processes p1 and p2. p1 creates robust mutex and waits on it (after user's input). p2 has 2 threads: thread 1 maps into the mutex and acquires it. thread 2 (after thread 1 has acquired the mutex) also maps into the same mutex and waits on it (since thread 1 owns it now). Also note that p1 starts waiting on the mutex after p2-thread1 has already acquire it. Now if we terminate p2, p1 never unblocks (meaning it's pthread_mutex_lock never returns) contrary to the supposed "robustness" where p1 should unblock with EOWNERDEAD error. Here is the code: p1.cpp: <pre class="prettyprint"><code> #include <sys/types.h> #include <sys/mman.h> #include <fcntl.h> #include <pthread.h> #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <sys/time.h> struct MyMtx { pthread_mutex_t m; }; int main(int argc, char **argv) { int r; pthread_mutexattr_t ma; pthread_mutexattr_init(&ma); pthread_mutexattr_setpshared(&ma, PTHREAD_PROCESS_SHARED); pthread_mutexattr_setrobust_np(&ma, PTHREAD_MUTEX_ROBUST_NP); int fd = shm_open("/test_mtx_p", O_RDWR|O_CREAT, 0666); ftruncate(fd, sizeof(MyMtx)); MyMtx *m = (MyMtx *)mmap(NULL, sizeof(MyMtx), PROT_READ | PROT_WRITE, MAP_SHARED,fd, 0); //close (fd); pthread_mutex_init(&m->m, &ma); puts("Press Enter to lock mutex"); fgetc(stdin); puts("locking..."); r = pthread_mutex_lock(&m->m); printf("pthread_mutex_lock returned %d\n", r); puts("Press Enter to unlock"); fgetc(stdin); r = pthread_mutex_unlock(&m->m); printf("pthread_mutex_unlock returned %d\n", r); puts("Before pthread_mutex_destroy"); r = pthread_mutex_destroy(&m->m); printf("After pthread_mutex_destroy, r=%d\n", r); munmap(m, sizeof(MyMtx)); shm_unlink("/test_mtx_p"); return 0; } </code></pre> p2.cpp: <pre class="prettyprint"><code> #include <sys/types.h> #include <sys/mman.h> #include <fcntl.h> #include <pthread.h> #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <signal.h> struct MyMtx { pthread_mutex_t m; }; static void *threadFunc(void *arg) { int fd = shm_open("/test_mtx_p", O_RDWR|O_CREAT, 0666); ftruncate(fd, sizeof(MyMtx)); MyMtx *m = (MyMtx *)mmap(NULL, sizeof(MyMtx), PROT_READ | PROT_WRITE, MAP_SHARED,fd, 0); sleep(2); //to let the first thread lock the mutex puts("Locking from another thread"); int r = 0; r = pthread_mutex_lock(&m->m); printf("locked from another thread r=%d\n", r); } int main(int argc, char **argv) { int r; int fd = shm_open("/test_mtx_p", O_RDWR|O_CREAT, 0666); ftruncate(fd, sizeof(MyMtx)); MyMtx *m = (MyMtx *)mmap(NULL, sizeof(MyMtx), PROT_READ | PROT_WRITE, MAP_SHARED,fd, 0); //close (fd); pthread_t tid; pthread_create(&tid, NULL, threadFunc, NULL); puts("locking"); r = pthread_mutex_lock(&m->m); printf("pthread_mutex_lock returned %d\n", r); puts("Press Enter to terminate"); fgetc(stdin); kill(getpid(), 9); return 0; } </code></pre> First, run p1, then run p2 and wait until it prints "Locking from another thread". Press Enter on p1's shell to lock the mutex, then press Enter on p2's shell to terminate p2, or you can just kill it some other way. You will see that p1 prints "locking..." and pthread_mutex_lock never returns. The problem actually doesn't happen all the time, looks like it depends on timing. If you let some time elapse after p1 starts locking and before terminating p2, sometime it works and p2's pthread_mutex_lock returns 130 (EOWNERDEAD). But if you terminate p2 right after or short time after p1 starts waiting on the mutex, p1 will never unblock. Has anybody else ever encountered the same issue?

Just verified behaviour with glibc version: 2.11.1 on Linux Kernel 2.6.32 and newer. My first finding: Iff you hit Enter in p1 before "Locking from another thread" in p2 (within 2s) the robust mutex works fine resp. as one would expect. Conclusion: The ordering of the waiting threads is important. The first waiting thread gets woken up. Unfortunately it is the Thread within p2 which, at that time, gets killed. See https://lkml.org/lkml/2013/9/27/338 for a description of the problem. I don't know whether there are kernel fixes/patches around. Don't even known whether it is considered a bug at all. Neverthless there seems a workaround for the whole mess. Use robust mutexes with PTHREAD_PRIO_INHERIT: <pre class="prettyprint"><code>pthread_mutexattr_setprotocol(&ma, PTHREAD_PRIO_INHERIT); </code></pre> Inside kernel (futex.c) instead of handle_futex_death() some other mechanism within exit_pi_state_list() does handle the wake up of other mutex waiters. It seems to solve the problem.

Bug with robust mutex

Tags:

c++

linux

mutex

pthreads

ipc

I m trying to use robust mutexes on linux to guard resources between processes and it seems that in some situations they do not behave in the "robust" way. By "robust" way i mean that pthread_mutex_lock should return EOWNERDEAD if the process owning the lock has terminated.

Here is the scenario where it doesn't work:

2 processes p1 and p2. p1 creates robust mutex and waits on it (after user's input). p2 has 2 threads: thread 1 maps into the mutex and acquires it. thread 2 (after thread 1 has acquired the mutex) also maps into the same mutex and waits on it (since thread 1 owns it now). Also note that p1 starts waiting on the mutex after p2-thread1 has already acquire it.

Now if we terminate p2, p1 never unblocks (meaning it's pthread_mutex_lock never returns) contrary to the supposed "robustness" where p1 should unblock with EOWNERDEAD error.

Here is the code:

p1.cpp:

Click to copy

    #include <sys/types.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <pthread.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>

struct MyMtx {
    pthread_mutex_t m;
};

int main(int argc, char **argv)
{
    int r;

    pthread_mutexattr_t ma;
    pthread_mutexattr_init(&ma);
    pthread_mutexattr_setpshared(&ma, PTHREAD_PROCESS_SHARED);
    pthread_mutexattr_setrobust_np(&ma, PTHREAD_MUTEX_ROBUST_NP);

    int fd = shm_open("/test_mtx_p", O_RDWR|O_CREAT, 0666);
    ftruncate(fd, sizeof(MyMtx));

    MyMtx *m = (MyMtx *)mmap(NULL, sizeof(MyMtx),
        PROT_READ | PROT_WRITE, MAP_SHARED,fd, 0);
    //close (fd);

    pthread_mutex_init(&m->m, &ma);

    puts("Press Enter to lock mutex");
    fgetc(stdin);

    puts("locking...");
    r = pthread_mutex_lock(&m->m);
    printf("pthread_mutex_lock returned %d\n", r);

    puts("Press Enter to unlock");
    fgetc(stdin);
    r = pthread_mutex_unlock(&m->m);
    printf("pthread_mutex_unlock returned %d\n", r);

    puts("Before pthread_mutex_destroy");
    r = pthread_mutex_destroy(&m->m);
    printf("After pthread_mutex_destroy, r=%d\n", r);

    munmap(m, sizeof(MyMtx));
    shm_unlink("/test_mtx_p");

    return 0;
}

p2.cpp:

Click to copy

    #include <sys/types.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <pthread.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>

struct MyMtx {
    pthread_mutex_t m;
};

static void *threadFunc(void *arg)
{
    int fd = shm_open("/test_mtx_p", O_RDWR|O_CREAT, 0666);
    ftruncate(fd, sizeof(MyMtx));

    MyMtx *m = (MyMtx *)mmap(NULL, sizeof(MyMtx),
        PROT_READ | PROT_WRITE, MAP_SHARED,fd, 0);
    sleep(2); //to let the first thread lock the mutex
    puts("Locking from another thread");
    int r = 0;
    r = pthread_mutex_lock(&m->m);
    printf("locked from another thread r=%d\n", r);
}

int main(int argc, char **argv)
{
    int r;
    int fd = shm_open("/test_mtx_p", O_RDWR|O_CREAT, 0666);
    ftruncate(fd, sizeof(MyMtx));

    MyMtx *m = (MyMtx *)mmap(NULL, sizeof(MyMtx),
        PROT_READ | PROT_WRITE, MAP_SHARED,fd, 0);
    //close (fd);

    pthread_t tid;
    pthread_create(&tid, NULL, threadFunc, NULL);

    puts("locking");
    r = pthread_mutex_lock(&m->m);
    printf("pthread_mutex_lock returned %d\n", r);

    puts("Press Enter to terminate");
    fgetc(stdin);

    kill(getpid(), 9);
    return 0;
}

First, run p1, then run p2 and wait until it prints "Locking from another thread". Press Enter on p1's shell to lock the mutex, then press Enter on p2's shell to terminate p2, or you can just kill it some other way. You will see that p1 prints "locking..." and pthread_mutex_lock never returns.

The problem actually doesn't happen all the time, looks like it depends on timing. If you let some time elapse after p1 starts locking and before terminating p2, sometime it works and p2's pthread_mutex_lock returns 130 (EOWNERDEAD). But if you terminate p2 right after or short time after p1 starts waiting on the mutex, p1 will never unblock.

Has anybody else ever encountered the same issue?

314

asked Dec 13 '13 00:12

Yevgeniy P

1 Answers

Just verified behaviour with glibc version: 2.11.1 on Linux Kernel 2.6.32 and newer.

My first finding: Iff you hit Enter in p1 before "Locking from another thread" in p2 (within 2s) the robust mutex works fine resp. as one would expect. Conclusion: The ordering of the waiting threads is important.

The first waiting thread gets woken up. Unfortunately it is the Thread within p2 which, at that time, gets killed.

See https://lkml.org/lkml/2013/9/27/338 for a description of the problem.

I don't know whether there are kernel fixes/patches around. Don't even known whether it is considered a bug at all.

Neverthless there seems a workaround for the whole mess. Use robust mutexes with PTHREAD_PRIO_INHERIT:

Click to copy

pthread_mutexattr_setprotocol(&ma, PTHREAD_PRIO_INHERIT);

Inside kernel (futex.c) instead of handle_futex_death() some other mechanism within exit_pi_state_list() does handle the wake up of other mutex waiters. It seems to solve the problem.

answered Nov 14 '22 23:11

NorbertM

Related questions
                            
                                Why does boost::multi_array's ConstMultiArrayConcept have a NumDims template argument?
                            
                                SQLite DB (with WAL) locked when preparing a "select" statmement - why?
                            
                                Writing your own partition recovery [closed]
                            
                                Maximum optimization of element wise multiplication via ARM NEON assembly
                            
                                Why is GCC producing a strange error and trying to call the wrong method when template arguments are specified explicitly?
                            
                                Extracting calling convention from a function type using template metaprogramming in c++
                            
                                Independent multithreaded processes block simultaneously
                            
                                Monitoring variable accesses in C/C++
                            
                                Element of shared_array as shared_ptr?
                            
                                A Base class technique for exception handling
                            
                                Why does 64-bit GCC warn about converting a const int to long unsigned int when allocating an array?
                            
                                Unexpected non-constant std::initializer_list
                            
                                Can not pass winform control size into unmanaged code
                            
                                Pass by value and different behavior
                            
                                How to correctly resolve incompatible throw specifiers with implicit virtual destructors?
                            
                                How to effectively pass arguments through many functions
                            
                                Why don't unordered containers provide an interface for defining minimum load factor?
                            
                                How to search in this tree?
                            
                                User defined qualifiers
                            
                                Will the new expression ever return a pointer to an array?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Bug with robust mutex

Tags:

c++

linux

mutex

pthreads

ipc

Yevgeniy P

People also ask

1 Answers

NorbertM

Recent Activity

Donate For Us