I have a program that, when it receives a SIGUSR1
, writes some output and quits. I'm trying to get sbatch
to notify this program before timing out.
I enqueue the program using:
sbatch -t 06:00:00 --signal=USR1 ... --wrap my_program
but my_program
never receives the signal. I've tried sending signals while the program is running, with: scancel -s USR1 <JOBID>
, but without any success. I also tried scancel --full
, but it kills the wrapper and my_program
is not notified.
One option is to write a bash file that wraps my_program and traps the signal, forwarding it to my_program
(similar to this example), but I don't need this cumbersome bash file for anything else. Also, sbatch --signal
documentation very clearly says that, when you want to notify the enveloping bash file, you need to specify signal=B:
, so I believe that the bash wrapper is not really necessary.
So, is there a way to send a SIGUSR1 signal to a program enqueued using sbatch --wrap
?
Your command is sending the USR1 to the shell created by the --wrap. However, if you want the signal to be caught and processed, you're going to need to write the shell functions to handle the signal and that's probably too much for a --wrap command.
These folks are doing it but you can't see into their setup.sh script to see what they are defining. https://docs.nersc.gov/jobs/examples/#annotated-example-automated-variable-time-jobs
Note they use "." to run the code in setup.sh in the same process instead of spawing a sub-shell. You need that.
These folks describe a nice method of creating the functions you need: Is it possible to detect *which* trap signal in bash?
The only thing they don't show there is the function that would actually take action on receiving the signal. Here's what I wrote that does it - put this in a file that can be included from any user's sbatch submit script and show them how to use it and the --signal option:
trap_with_arg() {
func="$1" ; shift
for sig ; do
echo "setting trap for $sig"
trap "$func $sig" "$sig"
done
}
func_trap () {
echo "called with sig $1"
case $1 in
USR1)
echo "caught SIGUSR1, making ABORT file"
date
cd $WORKDIR
touch ABORT
ls -l ABORT
;;
*) echo "something else" ;;
esac
}
trap_with_arg func_trap USR1 USR2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With