I wish to understand what do they mean here. Why would this program "hang"?
From https://bisqwit.iki.fi/story/howto/openmp/
OpenMP and
fork()
It is worth mentioning that using OpenMP in a program that callsfork()
requires special consideration. This problem only affects GCC; ICC is not affected. If your program intends to become a background process usingdaemonize()
or other similar means, you must not use the OpenMP features before the fork. After OpenMP features are utilized, a fork is only allowed if the child process does not use OpenMP features, or it does so as a completely new process (such as afterexec()
).This is an example of an erroneous program:
#include <stdio.h> #include <sys/wait.h> #include <unistd.h> void a(){ #pragma omp parallel num_threads(2) { puts("para_a"); // output twice } puts("a ended"); // output once } void b(){ #pragma omp parallel num_threads(2) { puts("para_b"); } puts("b ended"); } int main(){ a(); // Invokes OpenMP features (parent process) int p = fork(); if(!p){ b(); // ERROR: Uses OpenMP again, but in child process _exit(0); } wait(NULL); return 0; }
When run, this program hangs, never reaching the line that outputs "b ended". There is currently no workaround as the libgomp API does not specify functions that can be used to prepare for a call to
fork()
.
OpenMP uses a fork-join model of parallel execution. When a thread encounters a parallel construct, the thread creates a team composed of itself and some additional (possibly zero) number of threads. The encountering thread becomes the master of the new team.
Avoiding Race Conditions One approach to avoiding this program's race condition is to use a separate local variable integral for each thread instead of a global variable that is shared by all the threads.
Incremental parallelism: can work on one part of the program at one time, no dramatic change to code is needed. Unified code for both serial and parallel applications: OpenMP constructs are treated as comments when sequential compilers are used.
The code as posted violates the POSIX standard.
The POSIX fork()
standard states:
A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the
exec
functions is called.
Running OMP-parallelized code is clearly violating the above restriction.
To expand on Andrew Henle's answer, what fork(2)
does is create a second process that shares the entire memory space of the calling thread via copy-on-write (CoW) memory mappings. The child process is in an awkward situation - it is a replica of the parent thread with the same state (except the return value of the system call and some other things like timers and resource use counters) and access to all its memory and open file descriptors but without any other thread of execution besides the one that made the fork(2)
call. While with some precautions this can be used as a crude form of multithreading (and it was used for that purpose before true LWPs were introduced in Unix), 99% of the cases fork(2)
serves a singular purpose - to spawn child processes whereas the child calls execve(2)
(or one of its front-ends in the standard C library) immediately after the fork. In recognition of that fact, there is an even more extreme version called vfork(2)
that doesn't even create CoW mappings of the parent's memory but directly uses its page tables, effectively creating a hybrid between a standalone process and a thread. The child in that case is not even allowed to make async-signal-safe function calls because it operates on the parent's stack.
Note that the OpenMP specification does not cover any interaction with other threading and/or process control mechanisms, thus, even if it might work with some OpenMP implementations, your example is not a correct OpenMP program.
I hit on this in the following scenario:
(Using Python multithreading doesn't work, because the code in each parallel task has significant amount of Python code which then becomes essentially single-threaded due to the GIL. For the same reason, it doesn't make sense to just run the code in serial and only benefit from the parallelization inside the C++ extension.)
Note that invoking parallel functions such as numpy corrcoef somehow does manage to use parallel processing in each of the sub-processes. Presumably it doesn't use OpenMP to do it.
"It would have been nice" in OpenMP had a "reset everything, forget about all previously spawned threads, as if we just started execution". Then we could have invoked this function on the forked child processes and use OpenMP in each one (being careful to reduce the number of used OpenMP threads so as not to overload the system).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With