I have submitted a job to a SLURM queue, the job has run and completed. I then check the completed jobs using the sacct
command. But looking at the results of the sacct command I notice additional results that I did not expect:
JobID JobName State NCPUS Timelimit
5297048 test COMPLETED 1 00:10:00
5297048.bat+ batch COMPLETED 1
5297048.ext+ extern COMPLETED 1
Can anyone explain what the 'batch' and 'extern' jobs are and what their purpose is. Why does the extern job always complete even when the primary job fails.
I have attempted to search the documentation but have not found a satisfactory and complete answer.
EDIT: Here's the script I am submitting to produce the above sacct
output:
#!/bin/bash
echo test_script > done.txt
With the following sbatch
command:
sbatch -A BRIDGE-CORE-SL2-CPU --nodes=1 --ntasks=1 -p skylake --cpus-per-task 1 -J jobname -t 00:10:00 --output=./output.out --error=./error.err < test.sh
You can get statistics (accounting data) on completed jobs by passing either the jobID or username flags. Here, the command sacct -j 215578 is used to show statistics about the completed job. This shows information such as: the partition your job executed on, the account, and number of allocated CPUS per job steps.
The sacct command displays job accounting data stored in the job accounting log file or Slurm database in a variety of forms for your analysis. The sacct command displays information on jobs, job steps, status, and exitcodes by default.
Wall clock time The duration can be specified in minutes, or in the MM:SS, or HH:MM:SS format (as in the example), of on the D-HH:MM:SS. In general, the maximum walltime for CPU jobs is 120 hours (5 days); for jobs submitted to the GPU queue, the maximum walltime is 120 hours.
A Slurm job is just a resource allocation. You can execute many job steps within that allocation, either in parallel or sequentially. Some jobs actually launch thousands of job steps this way. The job steps will be allocated nodes that are not already allocated to other job steps.
A Slurm job contains multiple jobsteps, which are all accounted for (in terms of resource usage) separately by Slurm. Usually, these steps are created using srun/mpirun and enumerated starting from 0. But in addition to that, there are sometimes two special steps. For example, take the following job:
sbatch -n 4 --wrap="srun hostname; srun echo Hello World"
This resulted in the following sacct output:
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
5163571 wrap medium admin 4 COMPLETED 0:0
5163571.bat+ batch admin 4 COMPLETED 0:0
5163571.ext+ extern admin 4 COMPLETED 0:0
5163571.0 hostname admin 4 COMPLETED 0:0
5163571.1 echo admin 4 COMPLETED 0:0
The two srun
calls created the steps 5163571.0
and 5163571.1
. The 5163571.bat+
accounts for the ressources needed by the batch script (which in this case is just srun hostname; srun echo Hello World
. --wrap
just puts that into a file and adds #!/bin/sh
).
Many non-MPI programs do a lot of calculations in the batch step, so the ressource usage is accoutned there.
And now for 5163571.ext+
: This step accounts for all resources usage by that job outside of slurm. This only shows up, if the PrologFlag contain
is used.
An example of a process belonging to a slurm job, but not directly controlled by slurm are ssh sessions. If you ssh into a node where one of your jobs runs, your session will be placed into the context of the job (and you will be limited to your available resources by cgroups, if that is set up). And all calculations you do in that ssh session will be accounted for in the .extern job step.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With