Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Signal handling in OpenMP parallel program

I have a program which uses POSIX timer (timer_create()). Essentially the program sets a timer and starts performing some lengthy (potentially infinite) computation. When the timer expires and a signal handler is called, the handler prints the best result yet that has been computed and quits the program.

I consider doing the computation in parallel using OpenMP, because it should speed it up.

In pthreads, there are special functions for example for setting signal masks for my threads or so. Does OpenMP provide such control, or do I have to accept the fact that the signal can be delivered to any of the threads OpenMP creates?

Also, in case I am currently in a parallel section of my code and my handler is called, can it still safely kill the application (exit(0);) and do things like locking OpenMP locks?

like image 408
user7610 Avatar asked Nov 15 '11 14:11

user7610


2 Answers

OpenMP 3.1 standard says nothing about signals.

As I know, every popular OpenMP implementation on Linux/UNIX is based on pthreads, so OpenMP thread is pthread's thread. And generic rules of pthreads and signals apply.

Does OpenMP provide such control

No any specific control; but you can try to use pthread's control. Only problem is to know how much OpenMP threads are used and where to place controlling statement.

the signal can be delivered to any of the threads OpenMP creates?

By default, yes, it will be delivered to any thread.

my handler is called,

Usual rules about signal handler still applies. Functions allowed in signal handler are listed at http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html (at the end of page)

And printf is not allowed (write is). You can use printf if you know that at the moment of signal printf is not used by any thread (e.g. you has no printf in parallel region).

can it still safely kill the application (exit(0);)

Yes it can: abort() and _exit() are allowed from handler.

Linux/Unix will terminate all threads when any thread does exit or abort.

and do things like locking OpenMP locks?

You should not, but if you know that this lock will be not locked at the time of signal handler run, you can try to do this.

!! UPDATE

There is an example of adopting signalling to OpenMP http://www.cs.colostate.edu/~cs675/OpenMPvsThreads.pdf ("OpenMP versus Threading in C/C++"). In short: set a flag in handler and add checks of this flag in every thread at every Nth loop iteration.

Adapting a signal based exception mechanism to a parallel region

Something that occurs more with C/C++ applications that with Fortran applications is that the program uses a sophisticated user interface. Genehunter is a simple example where the user may interrupt the computation of one family tree by pressing control-C so that it can go on to the next family tree in a clinical database about the disease. The premature termination is handled in the serial version by a C++ like exception mechanism involving a signal handler, setjump, and longjump.OpenMP does not permit unstructured control flow to cross a parallel construct boundary. We modified the exception handling in the OpenMP version by changing the interrupt handler into a polling mechanism. The thread that catches the control-C signal sets a shared flag. All threads check the flag at the beginning of the loop by calling the routine has_hit_interrupt( ) and skip the iteration if it is set. When the loop ends, the master checks the flag and can easily execute the longjump to complete the exceptional exit (See Figure 1.)

like image 90
osgx Avatar answered Oct 18 '22 12:10

osgx


This is a bit late, but hopefully this example code will help others in a similar position!


As osgx mentioned, OpenMP is silent on the issue of signals, but as OpenMP is often implemented with pthreads on POSIX systems we can use a pthread signal approach.

For heavy computations using OpenMP, it is likely that there are only a few locations where computation can actually be safely halted. Therefore, for the case where you want to obtain premature results we can use synchronous signal handling to safely do this. An additional advantage is that this lets us accept the signal from a specific OpenMP thread (in the example code below, we choose the master thread). On catching the signal, we simply set a flag indicating that computation should stop. Each thread should then make sure to periodically check this flag when convenient, and then wrap up its share of the workload.

By using this synchronous approach, we allow computation to exit gracefully and with very minimal change to the algorithm. On the other hand, a signal handler approach as desired may not be appropriate, as it would likely be difficult to collate the current working states of each thread into a coherent result. One disadvantage of the synchronous approach though is that computation can take a noticeable amount of time to come to a stop.

The signal checking apparatus consists of three parts:

  • Blocking the relevant signals. This should be done outside of the omp parallel region so that each OpenMP thread (pthread) will inherit this same blocking behaviour.
  • Polling for the desired signals from the master thread. One can use sigtimedwait for this, but some systems (e.g. MacOS) don't support this. More portably, we can use sigpending to poll for any blocked signals and then double check that the blocked signals are what we're expecting before accepting them synchronously using sigwait (which should return immediately here, unless some other part of the program is creating a race condition). We finally set the relevant flag.
  • We should remove our signal mask at the end (optionally with one final check for signals).

There are some important performance considerations and caveats:

  • Assuming that each inner loop iteration is small, executing the signal checking syscalls is expensive. In the example code, we check for signals only every 10 million (per-thread) iterations, corresponding to perhaps a couple seconds of wall time.
  • omp for loops cannot be broken out of1, and so you must either spin for the remainder of the iterations or rewrite the loop using more basic OpenMP primitives. Regular loops (such as inner loops of an outer parallel loop) can be broken out of just fine.
  • If only the master thread can check for signals, then this may create an issue in programs where the master thread finishes well before the other threads. In this scenario, these other threads will be uninterruptible. To address this, you could 'pass the baton' of signal checking as each thread completes its workload, or the master thread could be forced to keep running and polling until all other threads complete2.
  • On some architectures such as NUMA HPCs, the time to check the 'global' signalled flag may be quite expensive, so take care when deciding when and where to check or manipulate the flag. For the spin loop section, for example, one may wish to locally cache the flag when it becomes true.

Here is the example code:

#include <signal.h>

void calculate() {
    _Bool signalled = false;
    int sigcaught;
    size_t steps_tot = 0;

    // block signals of interest (SIGINT and SIGTERM here)
    sigset_t oldmask, newmask, sigpend;
    sigemptyset(&newmask);
    sigaddset(&newmask, SIGINT);
    sigaddset(&newmask, SIGTERM);
    sigprocmask(SIG_BLOCK, &newmask, &oldmask);

    #pragma omp parallel
    {
        int rank = omp_get_thread_num();
        size_t steps = 0;

        // keep improving result forever, unless signalled
        while (!signalled) {
            #pragma omp for
            for (size_t i = 0; i < 10000; i++) {
                // we can't break from an omp for loop...
                // instead, spin away the rest of the iterations
                if (signalled) continue;

                for (size_t j = 0; j < 1000000; j++, steps++) {
                    // ***
                    // heavy computation...
                    // ***

                    // check for signal every 10 million steps
                    if (steps % 10000000 == 0) {

                        // master thread; poll for signal
                        if (rank == 0) {
                            sigpending(&sigpend);
                            if (sigismember(&sigpend, SIGINT) || sigismember(&sigpend, SIGTERM)) {
                                if (sigwait(&newmask, &sigcaught) == 0) {
                                    printf("Interrupted by %d...\n", sigcaught);
                                    signalled = true;
                                }
                            }
                        }

                        // all threads; stop computing
                        if (signalled) break;
                    }
                }
            }
        }

        #pragma omp atomic
        steps_tot += steps;
    }

    printf("The result is ... after %zu steps\n", steps_tot);

    // optional cleanup
    sigprocmask(SIG_SETMASK, &oldmask, NULL);
}

If using C++, you may find the following class useful...

#include <signal.h>
#include <vector>

class Unterminable {
    sigset_t oldmask, newmask;
    std::vector<int> signals;

public:
    Unterminable(std::vector<int> signals) : signals(signals) {
        sigemptyset(&newmask);
        for (int signal : signals)
            sigaddset(&newmask, signal);
        sigprocmask(SIG_BLOCK, &newmask, &oldmask);
    }

    Unterminable() : Unterminable({SIGINT, SIGTERM}) {}

    // this can be made more efficient by using sigandset,
    // but sigandset is not particularly portable
    int poll() {
        sigset_t sigpend;
        sigpending(&sigpend);
        for (int signal : signals) {
            if (sigismember(&sigpend, signal)) {
                int sigret;
                if (sigwait(&newmask, &sigret) == 0)
                    return sigret;
                break;
            }
        }
        return -1;
    }

    ~Unterminable() {
        sigprocmask(SIG_SETMASK, &oldmask, NULL);
    }
};

The blocking part of calculate() can then be replaced by Unterminable unterm();, and the signal checking part by if ((sigcaught = unterm.poll()) > 0) {...}. Unblocking the signals is automatically performed when unterm goes out of scope.


1 This is not strictly true. OpenMP supports limited support for performing a 'parallel break' in the form of cancellation points. If you choose to use cancellation points in your parallel loops, make sure you know exactly where the implicit cancellation points are so that you ensure that your computation data will be coherent upon cancellation.

2 Personally, I keep a count of how many threads have completed the for loop and, if the master thread completes the loop without catching a signal, it keeps polling for signals until either it catches a signal or all threads complete the loop. To do this, make sure to mark the for loop nowait.

like image 29
sourtin Avatar answered Oct 18 '22 12:10

sourtin