Slurm sbatch
directs stdout and stderr to the files specified by the -o
and -e
flags, but fails to do so if the filepath contains directories that don't exist. Is there some way to automatically make the directories for my log files?
The only way I've found to do this is to wrap my calls to sbatch
inside bash scripts that are many times longer than seems necessary for such a small thing. I've included a shortened example below.
#!/bin/bash
# Set up and run job array for my_script.py, which takes as positional
# arguments a config file (passed via $1) and an array index.
#SBATCH --array=1-100
#SBATCH -n 1
#SBATCH -t 12:00:00
#SBATCH -p short
#SBATCH -J sim_sumstats
#SBATCH --mem=1600
# Initialize variables used for script control flow
sub_or_main='sub'
# Parse options
while getopts ":A" opt; do
case $opt in
A)
sub_or_main='main'
;;
\?)
# Capture invalid options
echo "Invalid option: -$OPTARG" >&2
exit 1
;;
esac
done
shift $((OPTIND - 1))
# Either run the submit script or the main array
if [ $sub_or_main == 'sub' ]; then
# Submit script creates folders for log files, then calls sbatch on this
# script in main mode.
now=$(date +"%y%m%d-%H%M")
name=$(basename $1 .json)
logpath="log/my_script_name/$name/$now"
mkdir -p $logpath
sbatch \
-o $logpath/%a.out \
-e $logpath/%a.out \
$0 -A $1
else
# Main loop. Just calls my_script.py with the array ID.
python ./my_script.py $1 ${SLURM_ARRAY_TASK_ID}
fi
Having a script like this works, but seems awfully wasteful: I've more than doubled the length of my sbatch submit script just to organize my log files. Moreover, most of that is added code that's going to be similar between batch submit scripts for other jobs, e.g. calling my_script2.py
etc, so it makes for a lot of code duplication. Can't help but think there has to be a better way.
We have listed a few sample SBATCH scripts to assist users in building their own scripts to submit jobs. Please see our website here for partition information. To submit your SBATCH script to SLURM once you are finished, please save the file and start it with the command: ‘$ sbatch <your script name>.sh’.
When the job allocation is finally granted for the batch script, Slurm runs a single copy of the batch script on the first node in the set of allocated nodes.
The default is the name of the batch script, or just sbatch if the script is read on sbatch’s standard input. Instructs Slurm to connect the batch script’s standard output directly to the filename. If not specified, the default filename is slurm-jobID.out. Requests a specific partition for the resource allocation ( gpu, interactive, normal ).
Other than the batch script itself, Slurm does no movement of user files. When the job allocation is finally granted for the batch script, Slurm runs a single copy of the batch script on the first node in the set of allocated nodes. The following document describes the influence of various options on the allocation of cpus to jobs and tasks.
You can redirect the output of your Python script by yourself in your submission script, and either choose to discard the Slurm log, or write to the Slurm log interesting information about the job for provenance tracking and reproducibility purposes.
You could have a submission script go like this:
#!/bin/bash
# Set up and run job array for my_script.py, which takes as positional
# arguments a config file (passed via $1) and an array index.
#SBATCH --array=1-100
#SBATCH -n 1
#SBATCH -t 12:00:00
#SBATCH -p short
#SBATCH -J sim_sumstats
#SBATCH --mem=1600
now=$(date +"%y%m%d-%H%M")
name=$(basename $1 .json)
logpath="log/my_script_name/$name/$now"
mkdir -p $logpath
logfile="$logpath/${SLURM_ARRAY_TASK_ID}.out"
echo "Writing to ${logfile}"
scontrol show -dd job $SLURM_JOB_ID
printenv
python ./my_script.py $1 ${SLURM_ARRAY_TASK_ID} > ${logfile}
This way, the output from the Python script will be there where you want it, and the parent directory will be created before the log file is created.
Additionally, you will have the standard output file created by Slurm, with the default naming scheme, holding information about the job (from scontrol
) and from the environment (with printenv
).
But if you want to prevent Slurm from attempting to create the output file, set --output=/dev/null
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With