Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create directory for log file before calling slurm sbatch

Tags:

bash

slurm

Slurm sbatch directs stdout and stderr to the files specified by the -o and -e flags, but fails to do so if the filepath contains directories that don't exist. Is there some way to automatically make the directories for my log files?

  • Manually creating these directories each time is inefficient because I'm running each sbatch submission dozens of times.
  • Letting the variation over job names exist in filenames rather than directories makes for a huge, poorly organized mess of logs I have to sort through when I need to check how my jobs did.

The only way I've found to do this is to wrap my calls to sbatch inside bash scripts that are many times longer than seems necessary for such a small thing. I've included a shortened example below.

#!/bin/bash
# Set up and run job array for my_script.py, which takes as positional
# arguments a config file (passed via $1) and an array index.

#SBATCH --array=1-100
#SBATCH -n 1
#SBATCH -t 12:00:00
#SBATCH -p short
#SBATCH -J sim_sumstats
#SBATCH --mem=1600

# Initialize variables used for script control flow
sub_or_main='sub'

# Parse options
while getopts ":A" opt; do
    case $opt in
        A)
            sub_or_main='main'
            ;;
        \?)
            # Capture invalid options
            echo "Invalid option: -$OPTARG" >&2
            exit 1
            ;;
    esac
done

shift $((OPTIND - 1))

# Either run the submit script or the main array
if [ $sub_or_main == 'sub' ]; then
    # Submit script creates folders for log files, then calls sbatch on this
    # script in main mode.
    now=$(date +"%y%m%d-%H%M")
    name=$(basename $1 .json)
    logpath="log/my_script_name/$name/$now"
    mkdir -p $logpath
    sbatch \
        -o $logpath/%a.out \
        -e $logpath/%a.out \
        $0 -A $1
else
    # Main loop. Just calls my_script.py with the array ID.
    python ./my_script.py $1 ${SLURM_ARRAY_TASK_ID}
fi

Having a script like this works, but seems awfully wasteful: I've more than doubled the length of my sbatch submit script just to organize my log files. Moreover, most of that is added code that's going to be similar between batch submit scripts for other jobs, e.g. calling my_script2.py etc, so it makes for a lot of code duplication. Can't help but think there has to be a better way.

like image 350
Empiromancer Avatar asked Jan 25 '19 17:01

Empiromancer


People also ask

How do I submit a sbatch script to Slurm?

We have listed a few sample SBATCH scripts to assist users in building their own scripts to submit jobs. Please see our website here for partition information. To submit your SBATCH script to SLURM once you are finished, please save the file and start it with the command: ‘$ sbatch <your script name>.sh’.

How does Slurm handle job allocation for batch scripts?

When the job allocation is finally granted for the batch script, Slurm runs a single copy of the batch script on the first node in the set of allocated nodes.

What is the default filename of a Slurm script?

The default is the name of the batch script, or just sbatch if the script is read on sbatch’s standard input. Instructs Slurm to connect the batch script’s standard output directly to the filename. If not specified, the default filename is slurm-jobID.out. Requests a specific partition for the resource allocation ( gpu, interactive, normal ).

How does Slurm move user files?

Other than the batch script itself, Slurm does no movement of user files. When the job allocation is finally granted for the batch script, Slurm runs a single copy of the batch script on the first node in the set of allocated nodes. The following document describes the influence of various options on the allocation of cpus to jobs and tasks.


1 Answers

You can redirect the output of your Python script by yourself in your submission script, and either choose to discard the Slurm log, or write to the Slurm log interesting information about the job for provenance tracking and reproducibility purposes.

You could have a submission script go like this:

#!/bin/bash
# Set up and run job array for my_script.py, which takes as positional
# arguments a config file (passed via $1) and an array index.

#SBATCH --array=1-100
#SBATCH -n 1
#SBATCH -t 12:00:00
#SBATCH -p short
#SBATCH -J sim_sumstats
#SBATCH --mem=1600

now=$(date +"%y%m%d-%H%M")
name=$(basename $1 .json)
logpath="log/my_script_name/$name/$now"
mkdir -p $logpath
logfile="$logpath/${SLURM_ARRAY_TASK_ID}.out"

echo "Writing to ${logfile}"
scontrol show -dd job $SLURM_JOB_ID
printenv

python ./my_script.py $1 ${SLURM_ARRAY_TASK_ID} > ${logfile}

This way, the output from the Python script will be there where you want it, and the parent directory will be created before the log file is created.

Additionally, you will have the standard output file created by Slurm, with the default naming scheme, holding information about the job (from scontrol) and from the environment (with printenv).

But if you want to prevent Slurm from attempting to create the output file, set --output=/dev/null.

like image 76
damienfrancois Avatar answered Oct 20 '22 17:10

damienfrancois