The terminology used in the sbatch
man page might be a bit confusing. Thus, I want to be sure I am getting the options set right. Suppose I have a task to run on a single node with N threads. Am I correct to assume that I would use --nodes=1
and --ntasks=N
?
I am used to thinking about using, for example, pthreads to create N threads within a single process. Is the result of that what they refer to as "cores" or "cpus per task"? CPUs and threads are not the same things in my mind.
Each regular compute node has 64 cores, 500 GB of available memory, GigE and EDR (100Gbit) Infiniband interconnects. The HPC-Class partitions are accessible via the ISU HPC Nova cluster.
A node is the name usually used for one unit (usually one computer) in a computer cluster. Generally, this computer will have one or two CPUs, each normally with more than one core. The memory is always shared between cores on the same CPU, but generally not between the CPUs.
Threads are light-weight and fast, independently processing portions of work. With SM parallelism, jobs are limited to the total amount of memory and cores on one node. To run a threaded application properly through Slurm, you will need to specify a number of Slurm constraints.
Depending on the parallelism you are using: distributed or shared memory
--ntasks=#
: Number of "tasks" (use with distributed parallelism).
--ntasks-per-node=#
: Number of "tasks" per node (use with distributed parallelism).
--cpus-per-task=#
: Number of CPUs allocated to each task (use with shared memory parallelism).
From this question: if every node has 24 cores, is there any difference between these commands?
sbatch --ntasks 24 [...] sbatch --ntasks 1 --cpus-per-task 24 [...]
Answer: (by Matthew Mjelde)
Yes there is a difference between those two submissions. You are correct that usually
ntasks
is formpi
andcpus-per-task
is for multithreading, but let’s look at your commands:For your first example, the
sbatch --ntasks 24 […]
will allocate a job with 24 tasks. These tasks in this case are only 1 CPUs, but may be split across multiple nodes. So you get a total of 24 CPUs across multiple nodes.For your second example, the
sbatch --ntasks 1 --cpus-per-task 24 [...]
will allocate a job with 1 task and 24 CPUs for that task. Thus you will get a total of 24 CPUs on a single node.In other words, a task cannot be split across multiple nodes. Therefore, using
--cpus-per-task
will ensure it gets allocated to the same node, while using--ntasks
can and may allocate it to multiple nodes.
Another good Q&A from CÉCI's support website: Suppose you need 16 cores. Here are some use cases:
- you use mpi and do not care about where those cores are distributed:
--ntasks=16
- you want to launch 16 independent processes (no communication):
--ntasks=16
- you want those cores to spread across distinct nodes:
--ntasks=16 and --ntasks-per-node=1
or--ntasks=16 and --nodes=16
- you want those cores to spread across distinct nodes and no interference from other jobs:
--ntasks=16 --nodes=16 --exclusive
- you want 16 processes to spread across 8 nodes to have two processes per node:
--ntasks=16 --ntasks-per-node=2
- you want 16 processes to stay on the same node:
--ntasks=16 --ntasks-per-node=16
- you want one process that can use 16 cores for multithreading:
--ntasks=1 --cpus-per-task=16
- you want 4 processes that can use 4 cores each for multithreading:
--ntasks=4 --cpus-per-task=4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With