We have a C++ shared library that uses ZeroC's Ice library for RPC and unless we shut down Ice's runtime, we've observed child processes hanging on random mutexes. The Ice runtime starts threads, has many internal mutexes and keeps open file descriptors to servers. Additionally, we have a few of mutexes of our own to protect our internal state. Our shared library is used by hundreds of internal applications so we don't have control over when the process calls fork(), so we need a way to safely shutdown Ice and lock our mutexes while the process forks. Reading the POSIX standard on pthread_atfork() on handling mutexes and internal state: <blockquote> Alternatively, some libraries might have been able to supply just a child routine that reinitializes the mutexes in the library and all associated states to some known value (for example, what it was when the image was originally executed). This approach is not possible, though, because implementations are allowed to fail *_init() and *_destroy() calls for mutexes and locks if the mutex or lock is still locked. In this case, the child routine is not able to reinitialize the mutexes and locks. </blockquote> On Linux, the this test C program returns EPERM from pthread_mutex_unlock() in the child pthread_atfork() handler. Linux requires adding _NP to the PTHREAD_MUTEX_ERRORCHECK macro for it to compile. This program is linked from this good thread. Given that it's technically not safe or legal to unlock or destroy a mutex in the child, I'm thinking it's better to have pointers to mutexes and then have the child make new pthread_mutex_t on the heap and leave the parent's mutexes alone, thereby having a small memory leak. The only issue is how to reinitialize the state of the library and I'm thinking of reseting a pthread_once_t. Maybe because POSIX has an initializer for pthread_once_t that it can be reset to its initial state. <pre class="prettyprint"><code>#include <pthread.h> #include <stdlib.h> #include <string.h> static pthread_once_t once_control = PTHREAD_ONCE_INIT; static pthread_mutex_t *mutex_ptr = 0; static void setup_new_mutex() { mutex_ptr = malloc(sizeof(*mutex_ptr)); pthread_mutex_init(mutex_ptr, 0); } static void prepare() { pthread_mutex_lock(mutex_ptr); } static void parent() { pthread_mutex_unlock(mutex_ptr); } static void child() { // Reset the once control. pthread_once_t once = PTHREAD_ONCE_INIT; memcpy(&once_control, &once, sizeof(once_control)); } static void init() { setup_new_mutex(); pthread_atfork(&prepare, &parent, &child); } int my_library_call(int arg) { pthread_once(&once_control, &init); pthread_mutex_lock(mutex_ptr); // Do something here that requires the lock. int result = 2*arg; pthread_mutex_unlock(mutex_ptr); return result; } </code></pre> In the above sample in the child() I only reset the pthread_once_t by making a copy of a fresh pthread_once_t initialized with PTHREAD_ONCE_INIT. A new pthread_mutex_t is only created when the library function is invoked in the child process. This is hacky but maybe the best way of dealing with this skirting the standards. If the pthread_once_t contains a mutex then the system must have a way of initializing it from its PTHREAD_ONCE_INIT state. If it contains a pointer to a mutex allocated on the heap than it'll be forced to allocate a new one and set the address in the pthread_once_t. I'm hoping it doesn't use the address of the pthread_once_t for anything special which would defeat this. Searching <a href="http://groups.google.com/group/comp.programming.threads/search?group=comp.programming.threads&q=pthread_atfork&qt_g=Search+this+group" rel="noreferrer"> comp.programming.threads group for pthread_atfork()</a> shows a lot of good discussion and how little the POSIX standards really provides to solve this problem. There's also the issue that one should only call async-signal-safe functions from pthread_atfork() handlers, and it appears the most important one is the child handler, where only a memcpy() is done. Does this work? Is there a better way of dealing with the requirements of our shared library?

Congratulations, you found a defect in the standard. <code>pthread_atfork</code> is fundamentally unable to solve the problem it was created to solve with mutexes, because the handler in the child is not permitted to perform any operations on them: <ul> <li>It cannot unlock them, because the caller would be the new main thread in the newly created child process, and that's not the same thread as the thread (in the parent) that obtained the lock.</li> <li>It cannot destroy them, because they are locked.</li> <li>It cannot re-initialize them, because they have not been destroyed.</li> </ul> One potential workaround is to use POSIX semaphores in place of mutexes here. A semaphore does not have an owner, so if the parent process locks it (<code>sem_wait</code>), both the parent and child processes can unlock (<code>sem_post</code>) their respective copies without invoking any undefined behavior. As a nice aside, <code>sem_post</code> is async-signal-safe and thus definitely legal for the child to use.

How to use pthread_atfork() and pthread_once() to reinitialize mutexes in child processes

Tags:

We have a C++ shared library that uses ZeroC's Ice library for RPC and unless we shut down Ice's runtime, we've observed child processes hanging on random mutexes. The Ice runtime starts threads, has many internal mutexes and keeps open file descriptors to servers.

Additionally, we have a few of mutexes of our own to protect our internal state.

Our shared library is used by hundreds of internal applications so we don't have control over when the process calls fork(), so we need a way to safely shutdown Ice and lock our mutexes while the process forks.

Reading the POSIX standard on pthread_atfork() on handling mutexes and internal state:

Alternatively, some libraries might have been able to supply just a child routine that reinitializes the mutexes in the library and all associated states to some known value (for example, what it was when the image was originally executed). This approach is not possible, though, because implementations are allowed to fail *_init() and *_destroy() calls for mutexes and locks if the mutex or lock is still locked. In this case, the child routine is not able to reinitialize the mutexes and locks.

On Linux, the this test C program returns EPERM from pthread_mutex_unlock() in the child pthread_atfork() handler. Linux requires adding _NP to the PTHREAD_MUTEX_ERRORCHECK macro for it to compile.

This program is linked from this good thread.

Given that it's technically not safe or legal to unlock or destroy a mutex in the child, I'm thinking it's better to have pointers to mutexes and then have the child make new pthread_mutex_t on the heap and leave the parent's mutexes alone, thereby having a small memory leak.

The only issue is how to reinitialize the state of the library and I'm thinking of reseting a pthread_once_t. Maybe because POSIX has an initializer for pthread_once_t that it can be reset to its initial state.

Click to copy

#include <pthread.h>
#include <stdlib.h>
#include <string.h>

static pthread_once_t once_control = PTHREAD_ONCE_INIT;

static pthread_mutex_t *mutex_ptr = 0;

static void
setup_new_mutex()
{
    mutex_ptr = malloc(sizeof(*mutex_ptr));
    pthread_mutex_init(mutex_ptr, 0);
}

static void
prepare()
{
    pthread_mutex_lock(mutex_ptr);
}

static void
parent()
{
    pthread_mutex_unlock(mutex_ptr);
}

static void
child()
{
    // Reset the once control.
    pthread_once_t once = PTHREAD_ONCE_INIT;
    memcpy(&once_control, &once, sizeof(once_control));
}

static void
init()
{
    setup_new_mutex();
    pthread_atfork(&prepare, &parent, &child);
}

int
my_library_call(int arg)
{
    pthread_once(&once_control, &init);

    pthread_mutex_lock(mutex_ptr);
    // Do something here that requires the lock.
    int result = 2*arg;
    pthread_mutex_unlock(mutex_ptr);

    return result;
}

In the above sample in the child() I only reset the pthread_once_t by making a copy of a fresh pthread_once_t initialized with PTHREAD_ONCE_INIT. A new pthread_mutex_t is only created when the library function is invoked in the child process.

This is hacky but maybe the best way of dealing with this skirting the standards. If the pthread_once_t contains a mutex then the system must have a way of initializing it from its PTHREAD_ONCE_INIT state. If it contains a pointer to a mutex allocated on the heap than it'll be forced to allocate a new one and set the address in the pthread_once_t. I'm hoping it doesn't use the address of the pthread_once_t for anything special which would defeat this.

Searching comp.programming.threads group for pthread_atfork() shows a lot of good discussion and how little the POSIX standards really provides to solve this problem.

There's also the issue that one should only call async-signal-safe functions from pthread_atfork() handlers, and it appears the most important one is the child handler, where only a memcpy() is done.

Does this work? Is there a better way of dealing with the requirements of our shared library?

731

asked Apr 12 '10 06:04

Blair Zajac

2 Answers

Congratulations, you found a defect in the standard. pthread_atfork is fundamentally unable to solve the problem it was created to solve with mutexes, because the handler in the child is not permitted to perform any operations on them:

It cannot unlock them, because the caller would be the new main thread in the newly created child process, and that's not the same thread as the thread (in the parent) that obtained the lock.
It cannot destroy them, because they are locked.
It cannot re-initialize them, because they have not been destroyed.

One potential workaround is to use POSIX semaphores in place of mutexes here. A semaphore does not have an owner, so if the parent process locks it (sem_wait), both the parent and child processes can unlock (sem_post) their respective copies without invoking any undefined behavior.

As a nice aside, sem_post is async-signal-safe and thus definitely legal for the child to use.

186

answered Oct 06 '22 00:10

R.. GitHub STOP HELPING ICE

I consider this a bug in the programs calling fork(). In a multi-threaded process, the child process should call only async-signal-safe functions. If a program wants to fork without exec, it should do so before creating threads.

There isn't really a good solution for threaded fork()/pthread_atfork(). Some chunks of it appear to work, but this is not portable and liable to break across OS versions.

answered Oct 06 '22 00:10

jilles

Related questions
                            
                                Is there a WPF equaivalent to System.Windows.Forms.Screen?
                            
                                C++ preprocessor #define-ing a keyword. Is it standards conforming?
                            
                                Printing values of all fields in a C++ structure
                            
                                How to programmatically answer a call
                            
                                Hibernate @PostLoad never gets invoked
                            
                                How do I create a command like \title in LaTeX?
                            
                                Database with "Open Schema" - Good or Bad Idea?
                            
                                What is difference between " * " and "Auto" in Silverlight Grid Layout Definitions
                            
                                axis2 maven example
                            
                                List<int> initialization in C# 3.5
                            
                                Programming VHDL on Linux?
                            
                                Draw a JButton to look like a JLabel (or at least without the button edge?)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With