Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SLURM sbatch job array for the same script but with different input arguments run in parallel

I have a problem where I need to launch the same script but with different input arguments.

Say I have a script myscript.py -p <par_Val> -i <num_trial>, where I need to consider N different par_values (between x0 and x1) and M trials for each value of par_values.

Each trial of M is such that almost reaches the time limits of the cluster where I am working on (and I don't have priviledges to change this). So in practice I need to run NxM independent jobs.

Because each batch jobs has the same node/cpu configuration, and invokes the same python script, except for changing the input parameters, in principle, in pseudo-language I should have a sbatch script that should do something like:

#!/bin/bash
#SBATCH --job-name=cv_01
#SBATCH --output=cv_analysis_eis-%j.out
#SBATCH --error=cv_analysis_eis-%j.err
#SBATCH --partition=gpu2
#SBATCH --nodes=1
#SBATCH --cpus-per-task=4

for p1 in 0.05 0.075 0.1 0.25 0.5
do
    for i in {0..150..5}
    do
        python myscript.py -p p1 -v i
    done
done

where every call of the script is itself a batch job. Looking at the sbatch doc, the -a --array option seems promising. But in my case I need to change the input parameters for every script of the NxM that I have. How can I do this? I would like not to write NxM batch scripts and then list them in a txt file as suggested by this post. Nor the solution proposed here seems ideal, as this is the case imho of a job array. Moreover I would like to make sure that all the NxM scripts are launched at the same time, and the invoking above script is terminated right after, so that it won't clash with the time limit and my whole job will be terminated by the system and remain incomplete (whereas, since each of the NxM jobs is within such limit, if they are run together in parallel but independent, this won't happen).

like image 542
maurizio Avatar asked Jan 27 '17 18:01

maurizio


People also ask

What is Slurm Sbatch?

sbatch submits a batch script to Slurm. The batch script may be given to sbatch through a file name on the command line, or if no file name is specified, sbatch will read in a script from standard input. The batch script may contain options preceded with "#SBATCH" before any executable commands in the script.

What is Slurm_array_task_id?

SLURM_ARRAY_TASK_ID will be set to the job array index value. SLURM_ARRAY_TASK_COUNT will be set to the number of tasks in the job array. SLURM_ARRAY_TASK_MAX will be set to the highest job array index value. SLURM_ARRAY_TASK_MIN will be set to the lowest job array index value. For example a job submission of this sort.

Which Slurm command is used to submit a batch job?

If you need this functionality, you can instead use the salloc command to get a Slurm job allocation, execute a command (such as srun or a shell script containing srun commands), and then, when the command finishes, enter exit to release the allocated resources.

How do I submit an array job Slurm?

The syntax for submitting job arrays is: sbatch --array <indexlist>[%<limit>] arrayscript.sh . The <limit> is optional. Submitting the script to SLURM will return the parent SLURM_ARRAY_JOB_ID.


1 Answers

The best approach is to use job arrays.

One option is to pass the parameter p1 when submitting the job script, so you will only have one script, but will have to submit it multiple times, once for each p1 value.

The code will be like this (untested):

#!/bin/bash
#SBATCH --job-name=cv_01
#SBATCH --output=cv_analysis_eis-%j-%a.out
#SBATCH --error=cv_analysis_eis-%j-%a.err
#SBATCH --partition=gpu2
#SBATCH --nodes=1
#SBATCH --cpus-per-task=4
#SBATCH -a 0-150:5

python myscript.py -p $1 -v $SLURM_ARRAY_TASK_ID

and you will submit it with:

sbatch my_jobscript.sh 0.05
sbatch my_jobscript.sh 0.075
...

Another approach is to define all the p1 parameters in a bash array and submit NxM jobs (untested)

#!/bin/bash
#SBATCH --job-name=cv_01
#SBATCH --output=cv_analysis_eis-%j-%a.out
#SBATCH --error=cv_analysis_eis-%j-%a.err
#SBATCH --partition=gpu2
#SBATCH --nodes=1
#SBATCH --cpus-per-task=4
#Make the array NxM
#SBATCH -a 0-150

PARRAY=(0.05 0.075 0.1 0.25 0.5)    

#p1 is the element of the array found with ARRAY_ID mod P_ARRAY_LENGTH
p1=${PARRAY[`expr $SLURM_ARRAY_TASK_ID % ${#PARRAY[@]}`]}
#v is the integer division of the ARRAY_ID by the lenght of 
v=`expr $SLURM_ARRAY_TASK_ID / ${#PARRAY[@]}`
python myscript.py -p $p1 -v $v
like image 64
Carles Fenoy Avatar answered Sep 17 '22 18:09

Carles Fenoy