When using mpirun
, is it possible to catch signals (for example, the SIGINT generated by ^C
) in the code being run?
For example, I'm running a parallelized python code. I can except KeyboardInterrupt
to catch those errors when running python blah.py
by itself, but I can't when doing mpirun -np 1 python blah.py
.
Does anyone have a suggestion? Even finding how to catch signals in a C or C++ compiled program would be a helpful start.
If I send a signal to the spawned Python processes, they can handle the signals properly; however, signals sent to the parent orterun
process (i.e. from exceeding wall time on a cluster, or pressing control-C in a terminal) will kill everything immediately.
I think it is really implementation dependent.
In SLURM, I tried to use sbatch --signal USR1@30
to send SIGUSR1
(whose signum is 30,10 or 16) to the program launched by srun
commands. And the process received signal SIGUSR1 = 10
.
For platform MPI of IBM, according to https://www.ibm.com/support/knowledgecenter/en/SSF4ZA_9.1.4/pmpi_guide/signal_propagation.html
SIGINT, SIGUSR1, SIGUSR2
will be bypassed to processes.
In MPICH, SIGUSR1 is used by the process manager for internal notification of abnormal failures. ref: http://lists.mpich.org/pipermail/discuss/2014-October/003242.html>
Open MPI on the other had will forward SIGUSR1 and SIGUSR2 from mpiexec to the other processes. ref: http://www.open-mpi.org/doc/v1.6/man1/mpirun.1.php#sect14>
For IntelMPI, according to https://software.intel.com/en-us/mpi-developer-reference-linux-hydra-environment-variables
I_MPI_JOB_SIGNAL_PROPAGATION
and I_MPI_JOB_TIMEOUT_SIGNAL
can be set to send signal.
Another thing worth notice: For many python scripts, they will invoke other library or codes through cython, and if the SIGUSR1
is caught by the sub-process, something unwanted might happen.
If you use mpirun --nw
, then mpirun
itself should terminate as soon as it's started the subprocesses, instead of waiting for their termination; if that's acceptable then I believe your processes would be able to catch their own signals.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With