Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to hold up a script until a slurm job (start with srun) is completely finished?

I am running a job array with SLURM, with the following job array script (that I run with sbatch job_array_script.sh [args]:

#!/bin/bash

#SBATCH ... other options ...

#SBATCH --array=0-1000%200

srun ./job_slurm_script.py $1 $2 $3 $4

echo 'open' > status_file.txt

To explain, I want job_slurm_script.py to be run as an array job 1000 times with 200 tasks maximum in parallel. And when all of those are done, I want to write 'open' to status_file.txt. This is because in reality I have more than 10,000 jobs, and this is above my cluster's MaxSubmissionLimit, so I need to split it into smaller chunks (at 1000-element job arrays) and run them one after the other (only when the previous one is finished).

However, for this to work, the echo statement can only trigger once the entire job array is finished (outside of this, I have a loop which checks status_file.txt so see if the job is finished, i.e when the contents are the string 'open').

Up to now I thought that srun holds the script up until the whole job array is finished. However, sometimes srun "returns" and the script goes to the echo statement before the jobs are finished, so all the subsequent jobs bounce off the cluster since it goes above the submission limit.

So how do I make srun "hold up" until the whole job array is finished?

like image 984
Marses Avatar asked Sep 26 '17 12:09

Marses


People also ask

How do you use SRUN in Slurm?

After typing your srun command and options on the command line and pressing enter, Slurm will find and then allocate the resources you specified. Depending on what you specified, it can take a few minutes for Slurm to allocate those resources. You can view all of the srun options on the Slurm documentation website.

How do I submit a Slurm job script?

There are two ways of submitting a job to SLURM: Submit via a SLURM job script - create a bash script that includes directives to the SLURM scheduler. Submit via command-line options - provide directives to SLURM via command-line arguments.

How do I stop SRUN?

Alternatively, you can cancel a job submitted by srun or in an interactive shell, with salloc, by pressing Ctrl-C . In the example below, we have asked to start an interactive job, which we then cancel during waiting. Note Do not kill/skill srun to cancel a SLURM job! Doing so only terminates srun .


1 Answers

You can add the flag --wait to sbatch.

Check the manual page of sbatch for information about --wait.

like image 132
Aditya Kulkarni Avatar answered Nov 15 '22 09:11

Aditya Kulkarni