When scheduling a batch job in SLURM, e.g.
sbatch -N 10 batch-script.sh
#!/bin/bash
#SBATCH --job-name=jobname
srun --label /usr/bin/hostname
it is possible to check which step is currently running with sacct:
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
...
421.1 hostname test 10 RUNNING 0:0
But how can one check which tasks/nodes are still running in the current step and which have finished? (In this case there's only 1 task per node.) The only option I found in the docs is to set a --task-epilog command and log something when each task is done.
It would be great to see, for example, that 8 out of 10 nodes have finished their task, and node03 and node08 are still running theirs.
You can see which nodes are active with the squeue command. To filter for only your jobs you can do squeue -u [yourname]. To always keep updating you can do watch -n 1 "squeue -u [yourname]".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With