How do the terms "job", "task", and "step" as used in the SLURM docs relate to each other?
AFAICT, a job may consist of multiple tasks, and also it make consist of multiple steps, but, assuming this is true, it's still not clear to me how tasks and steps relate.
It would be helpful to see an example showing the full complexity of jobs/tasks/steps.
Job is work that needs to be done. A task is a piece of work that needs to be done. The process is a series of actions that is done for a particular purpose. Job and task define the work to be done, whereas process defines the way the work can be done or how the work should be done.
Question: What is the difference between Job and Process? Answer: A process refers to a program under execution. This program may be an application or system program. Job means an application program and it is not a system program.
Job Task Analysis (JTA) involves a comprehensive examination and breakdown of the demands specific to a particular task within a workplace. JTA's are conducted by experienced Occupational Health Consultants, with extensive knowledge of workplace environments, skilled in observation of movement.
What Is a Work Task Analysis? A Work Task Analysis identifies the physical details of a job to create a right fit between the job and employee. In short, an analysis will provide the exact requirements necessary to perform a job safely.
A job consists in one or more steps, each consisting in one or more tasks each using one or more CPU.
Jobs are typically created with the sbatch
command, steps are created with the srun
command, tasks are requested, at the job level with --ntasks
or --ntasks-per-node
, or at the step level with --ntasks
. CPUs are requested per task with --cpus-per-task
. Note that jobs submitted with sbatch
have one implicit step; the Bash script itself.
Assume the hypothetical job:
#SBATCH --nodes 8 #SBATCH --tasks-per-node 8 # The job requests 64 CPUs, on 8 nodes. # First step, with a sub-allocation of 8 tasks (one per node) to create a tmp dir. # No need for more than one task per node, but it has to run on every node srun --nodes 8 --ntasks 8 mkdir -p /tmp/$USER/$SLURM_JOBID # Second step with the full allocation (64 tasks) to run an MPI # program on some data to produce some output. srun process.mpi <input.dat >output.txt # Third step with a sub allocation of 48 tasks (because for instance # that program does not scale as well) to post-process the output and # extract meaningful information srun --ntasks 48 --nodes 6 --exclusive postprocess.mpi <output.txt >result.txt & # Four step with a sub-allocation on a single node # to compress the raw output. This step runs at the same time as # the previous one thanks to the ampersand `&` srun --ntasks 12 --nodes 1 --exclusive compress.mpi output.txt & wait
Four steps were created and so the accounting information for that job will have 5 lines; one per step plus one for the Bash script itself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With