Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I send a timeout signal to a wrapped command in sbatch?

Tags:

slurm

sbatch

I have a program that, when it receives a SIGUSR1, writes some output and quits. I'm trying to get sbatch to notify this program before timing out.

I enqueue the program using:

sbatch -t 06:00:00 --signal=USR1 ... --wrap my_program

but my_program never receives the signal. I've tried sending signals while the program is running, with: scancel -s USR1 <JOBID>, but without any success. I also tried scancel --full, but it kills the wrapper and my_program is not notified.

One option is to write a bash file that wraps my_program and traps the signal, forwarding it to my_program (similar to this example), but I don't need this cumbersome bash file for anything else. Also, sbatch --signal documentation very clearly says that, when you want to notify the enveloping bash file, you need to specify signal=B:, so I believe that the bash wrapper is not really necessary.

So, is there a way to send a SIGUSR1 signal to a program enqueued using sbatch --wrap?

like image 325
Nicolas Loira Avatar asked Oct 17 '25 01:10

Nicolas Loira


1 Answers

Your command is sending the USR1 to the shell created by the --wrap. However, if you want the signal to be caught and processed, you're going to need to write the shell functions to handle the signal and that's probably too much for a --wrap command.

These folks are doing it but you can't see into their setup.sh script to see what they are defining. https://docs.nersc.gov/jobs/examples/#annotated-example-automated-variable-time-jobs

Note they use "." to run the code in setup.sh in the same process instead of spawing a sub-shell. You need that.

These folks describe a nice method of creating the functions you need: Is it possible to detect *which* trap signal in bash?

The only thing they don't show there is the function that would actually take action on receiving the signal. Here's what I wrote that does it - put this in a file that can be included from any user's sbatch submit script and show them how to use it and the --signal option:

trap_with_arg() {
    func="$1" ; shift
    for sig ; do
        echo "setting trap for $sig"
        trap "$func $sig" "$sig"
    done
}

func_trap () {
    echo "called with sig $1"
    case $1 in
        USR1)
            echo "caught SIGUSR1, making ABORT file"
            date
            cd $WORKDIR
            touch ABORT
            ls -l ABORT
        ;;
        *) echo "something else" ;;
    esac
}

trap_with_arg func_trap USR1 USR2
like image 71
Mike Diehn Avatar answered Oct 20 '25 08:10

Mike Diehn