Understanding OpenMP shortcomings regarding fork

Tags:

openmp

I wish to understand what do they mean here. Why would this program "hang"?

From https://bisqwit.iki.fi/story/howto/openmp/

OpenMP and fork() It is worth mentioning that using OpenMP in a program that calls fork() requires special consideration. This problem only affects GCC; ICC is not affected. If your program intends to become a background process using daemonize() or other similar means, you must not use the OpenMP features before the fork. After OpenMP features are utilized, a fork is only allowed if the child process does not use OpenMP features, or it does so as a completely new process (such as after exec()).

This is an example of an erroneous program:

Click to copy
#include <stdio.h>   
#include <sys/wait.h>   
#include <unistd.h>

void a(){
    #pragma omp parallel num_threads(2)
    {
        puts("para_a"); // output twice
    }
    puts("a ended"); // output once   
}

void b(){
    #pragma omp parallel num_threads(2)
    {
        puts("para_b");
    }
    puts("b ended");   
}

int main(){    
    a();   // Invokes OpenMP features (parent process)   
    int p = fork();    
    if(!p){
        b(); // ERROR: Uses OpenMP again, but in child process
        _exit(0);    
    }    
    wait(NULL);    
    return 0;   
}
When run, this program hangs, never reaching the line that outputs "b ended". There is currently no workaround as the libgomp API does not specify functions that can be used to prepare for a call to fork().

331

asked Mar 01 '18 12:03

3 Answers

The code as posted violates the POSIX standard.

The POSIX fork() standard states:

A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called.

Running OMP-parallelized code is clearly violating the above restriction.

115

answered Oct 11 '22 21:10

To expand on Andrew Henle's answer, what fork(2) does is create a second process that shares the entire memory space of the calling thread via copy-on-write (CoW) memory mappings. The child process is in an awkward situation - it is a replica of the parent thread with the same state (except the return value of the system call and some other things like timers and resource use counters) and access to all its memory and open file descriptors but without any other thread of execution besides the one that made the fork(2) call. While with some precautions this can be used as a crude form of multithreading (and it was used for that purpose before true LWPs were introduced in Unix), 99% of the cases fork(2) serves a singular purpose - to spawn child processes whereas the child calls execve(2) (or one of its front-ends in the standard C library) immediately after the fork. In recognition of that fact, there is an even more extreme version called vfork(2) that doesn't even create CoW mappings of the parent's memory but directly uses its page tables, effectively creating a hybrid between a standalone process and a thread. The child in that case is not even allowed to make async-signal-safe function calls because it operates on the parent's stack.

Note that the OpenMP specification does not cover any interaction with other threading and/or process control mechanisms, thus, even if it might work with some OpenMP implementations, your example is not a correct OpenMP program.

answered Oct 11 '22 22:10

Hristo Iliev

I hit on this in the following scenario:

Using Python
Implementing a C++ python extension
Using OpenMP within this extension
Using Python multiprocessing parallel execution (which seems to use fork and copy-on-write to provide "free" poor-man-read-only-shared-memory between the sub-processes).
Invoking the extension within the sub-processes => Deadlock.

(Using Python multithreading doesn't work, because the code in each parallel task has significant amount of Python code which then becomes essentially single-threaded due to the GIL. For the same reason, it doesn't make sense to just run the code in serial and only benefit from the parallelization inside the C++ extension.)

Note that invoking parallel functions such as numpy corrcoef somehow does manage to use parallel processing in each of the sub-processes. Presumably it doesn't use OpenMP to do it.

"It would have been nice" in OpenMP had a "reset everything, forget about all previously spawned threads, as if we just started execution". Then we could have invoked this function on the forked child processes and use OpenMP in each one (being careful to reduce the number of used OpenMP threads so as not to overload the system).

answered Oct 11 '22 21:10

Oren Ben-Kiki

Related questions
                            
                                Reasonably portable way to get top 64-bits from 64x64 bit multiply? [duplicate]
                            
                                Using macros to implement a generic vector in C. Is this a good idea?
                            
                                Programmatically get running application bundles in OS X
                            
                                How to find relative path given two absolute paths?
                            
                                extending Lua: check number of parameters passed to a function
                            
                                How to decipher complex pointer declarations in C?
                            
                                What is the meaning of the data32 data32 nopw %cs:0x0(%rax,%rax,1) instruction in gcc inline asm?
                            
                                C99 remove stricmp() and strnicmp()?
                            
                                Set precision dynamically using sprintf
                            
                                Unable to compile with make | fatal error No space left on device
                            
                                In C, why can't the value of a pointer-to-char variable be changed after it has been assigned?
                            
                                Pointer in 2D Array [duplicate]
                            
                                unsequenced modification and access to pointer
                            
                                Program which source code is exactly the same as its output [duplicate]
                            
                                What is the difference between literals and variables in C (signed vs unsigned short ints)?
                            
                                Symbol visibility not working as expected
                            
                                Return a pointer that points to a local variable [duplicate]
                            
                                Difference between crc32() implementations of <linux/crc32.h> and <zlib.h> in C
                            
                                What is the principle of "Time Travel Debugger"?
                            
                                In this bubble sort code what do these variables c & d mean in C?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Understanding OpenMP shortcomings regarding fork

Tags:

c

openmp

Aquarius_Girl

People also ask

3 Answers

Andrew Henle

Hristo Iliev

Oren Ben-Kiki

Recent Activity

Donate For Us