Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding OpenMP shortcomings regarding fork

Tags:

c

openmp

I wish to understand what do they mean here. Why would this program "hang"?

From https://bisqwit.iki.fi/story/howto/openmp/

OpenMP and fork() It is worth mentioning that using OpenMP in a program that calls fork() requires special consideration. This problem only affects GCC; ICC is not affected. If your program intends to become a background process using daemonize() or other similar means, you must not use the OpenMP features before the fork. After OpenMP features are utilized, a fork is only allowed if the child process does not use OpenMP features, or it does so as a completely new process (such as after exec()).

This is an example of an erroneous program:

#include <stdio.h>   
#include <sys/wait.h>   
#include <unistd.h>

void a(){
    #pragma omp parallel num_threads(2)
    {
        puts("para_a"); // output twice
    }
    puts("a ended"); // output once   
}

void b(){
    #pragma omp parallel num_threads(2)
    {
        puts("para_b");
    }
    puts("b ended");   
}

int main(){    
    a();   // Invokes OpenMP features (parent process)   
    int p = fork();    
    if(!p){
        b(); // ERROR: Uses OpenMP again, but in child process
        _exit(0);    
    }    
    wait(NULL);    
    return 0;   
}

When run, this program hangs, never reaching the line that outputs "b ended". There is currently no workaround as the libgomp API does not specify functions that can be used to prepare for a call to fork().

like image 331
Aquarius_Girl Avatar asked Mar 01 '18 12:03

Aquarius_Girl


People also ask

Does OpenMP use fork?

OpenMP uses a fork-join model of parallel execution. When a thread encounters a parallel construct, the thread creates a team composed of itself and some additional (possibly zero) number of threads. The encountering thread becomes the master of the new team.

How do I avoid race conditions in OpenMP?

Avoiding Race Conditions One approach to avoiding this program's race condition is to use a separate local variable integral for each thread instead of a global variable that is shared by all the threads.

What kind of parallelism does OpenMP use?

Incremental parallelism: can work on one part of the program at one time, no dramatic change to code is needed. Unified code for both serial and parallel applications: OpenMP constructs are treated as comments when sequential compilers are used.


3 Answers

The code as posted violates the POSIX standard.

The POSIX fork() standard states:

A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called.

Running OMP-parallelized code is clearly violating the above restriction.

like image 115
Andrew Henle Avatar answered Oct 11 '22 21:10

Andrew Henle


To expand on Andrew Henle's answer, what fork(2) does is create a second process that shares the entire memory space of the calling thread via copy-on-write (CoW) memory mappings. The child process is in an awkward situation - it is a replica of the parent thread with the same state (except the return value of the system call and some other things like timers and resource use counters) and access to all its memory and open file descriptors but without any other thread of execution besides the one that made the fork(2) call. While with some precautions this can be used as a crude form of multithreading (and it was used for that purpose before true LWPs were introduced in Unix), 99% of the cases fork(2) serves a singular purpose - to spawn child processes whereas the child calls execve(2) (or one of its front-ends in the standard C library) immediately after the fork. In recognition of that fact, there is an even more extreme version called vfork(2) that doesn't even create CoW mappings of the parent's memory but directly uses its page tables, effectively creating a hybrid between a standalone process and a thread. The child in that case is not even allowed to make async-signal-safe function calls because it operates on the parent's stack.

Note that the OpenMP specification does not cover any interaction with other threading and/or process control mechanisms, thus, even if it might work with some OpenMP implementations, your example is not a correct OpenMP program.

like image 7
Hristo Iliev Avatar answered Oct 11 '22 22:10

Hristo Iliev


I hit on this in the following scenario:

  • Using Python
  • Implementing a C++ python extension
  • Using OpenMP within this extension
  • Using Python multiprocessing parallel execution (which seems to use fork and copy-on-write to provide "free" poor-man-read-only-shared-memory between the sub-processes).
  • Invoking the extension within the sub-processes => Deadlock.

(Using Python multithreading doesn't work, because the code in each parallel task has significant amount of Python code which then becomes essentially single-threaded due to the GIL. For the same reason, it doesn't make sense to just run the code in serial and only benefit from the parallelization inside the C++ extension.)

Note that invoking parallel functions such as numpy corrcoef somehow does manage to use parallel processing in each of the sub-processes. Presumably it doesn't use OpenMP to do it.

"It would have been nice" in OpenMP had a "reset everything, forget about all previously spawned threads, as if we just started execution". Then we could have invoked this function on the forked child processes and use OpenMP in each one (being careful to reduce the number of used OpenMP threads so as not to overload the system).

like image 2
Oren Ben-Kiki Avatar answered Oct 11 '22 21:10

Oren Ben-Kiki