The terminology used in the <code>sbatch</code> man page might be a bit confusing. Thus, I want to be sure I am getting the options set right. Suppose I have a task to run on a single node with N threads. Am I correct to assume that I would use <code>--nodes=1</code> and <code>--ntasks=N</code>? I am used to thinking about using, for example, pthreads to create N threads within a single process. Is the result of that what they refer to as "cores" or "cpus per task"? CPUs and threads are not the same things in my mind.

Depending on the parallelism you are using: distributed or shared memory <blockquote> <code>--ntasks=#</code> : Number of "tasks" (use with distributed parallelism). <code>--ntasks-per-node=#</code> : Number of "tasks" per node (use with distributed parallelism). <code>--cpus-per-task=#</code> : Number of CPUs allocated to each task (use with shared memory parallelism). </blockquote> <hr> From this question: if every node has 24 cores, is there any difference between these commands? <pre class="prettyprint"><code>sbatch --ntasks 24 [...] sbatch --ntasks 1 --cpus-per-task 24 [...] </code></pre> Answer: (by Matthew Mjelde) <blockquote> Yes there is a difference between those two submissions. You are correct that usually <code>ntasks</code> is for <code>mpi</code> and <code>cpus-per-task</code> is for multithreading, but let’s look at your commands: For your first example, the <code>sbatch --ntasks 24 […]</code> will allocate a job with 24 tasks. These tasks in this case are only 1 CPUs, but may be split across multiple nodes. So you get a total of 24 CPUs across multiple nodes. For your second example, the <code>sbatch --ntasks 1 --cpus-per-task 24 [...]</code> will allocate a job with 1 task and 24 CPUs for that task. Thus you will get a total of 24 CPUs on a single node. In other words, a task cannot be split across multiple nodes. Therefore, using <code>--cpus-per-task</code> will ensure it gets allocated to the same node, while using <code>--ntasks</code> can and may allocate it to multiple nodes. </blockquote> <hr> Another good Q&A from CÉCI's support website: Suppose you need 16 cores. Here are some use cases: <blockquote> <ul> <li>you use mpi and do not care about where those cores are distributed: <code>--ntasks=16</code> </li> <li>you want to launch 16 independent processes (no communication): <code>--ntasks=16</code> </li> <li>you want those cores to spread across distinct nodes: <code>--ntasks=16 and --ntasks-per-node=1</code> or <code>--ntasks=16 and --nodes=16</code> </li> <li>you want those cores to spread across distinct nodes and no interference from other jobs: <code>--ntasks=16 --nodes=16 --exclusive</code> </li> <li>you want 16 processes to spread across 8 nodes to have two processes per node: <code>--ntasks=16 --ntasks-per-node=2</code> </li> <li>you want 16 processes to stay on the same node: <code>--ntasks=16 --ntasks-per-node=16</code> </li> <li>you want one process that can use 16 cores for multithreading: <code>--ntasks=1 --cpus-per-task=16</code> </li> <li>you want 4 processes that can use 4 cores each for multithreading: <code>--ntasks=4 --cpus-per-task=4</code> </li> </ul> </blockquote>

HPC cluster: select the number of CPUs and threads in SLURM sbatch

Tags:

multithreading

parallel-processing

mpi

hpc

slurm

The terminology used in the sbatch man page might be a bit confusing. Thus, I want to be sure I am getting the options set right. Suppose I have a task to run on a single node with N threads. Am I correct to assume that I would use --nodes=1 and --ntasks=N?

I am used to thinking about using, for example, pthreads to create N threads within a single process. Is the result of that what they refer to as "cores" or "cpus per task"? CPUs and threads are not the same things in my mind.

703

asked Jul 02 '18 15:07

Tanash

1 Answers

Depending on the parallelism you are using: distributed or shared memory

--ntasks=# : Number of "tasks" (use with distributed parallelism).

--ntasks-per-node=# : Number of "tasks" per node (use with distributed parallelism).

--cpus-per-task=# : Number of CPUs allocated to each task (use with shared memory parallelism).

From this question: if every node has 24 cores, is there any difference between these commands?

sbatch --ntasks 24 [...] sbatch --ntasks 1 --cpus-per-task 24 [...]

Answer: (by Matthew Mjelde)

Yes there is a difference between those two submissions. You are correct that usually ntasks is for mpi and cpus-per-task is for multithreading, but let’s look at your commands:

For your first example, the sbatch --ntasks 24 […] will allocate a job with 24 tasks. These tasks in this case are only 1 CPUs, but may be split across multiple nodes. So you get a total of 24 CPUs across multiple nodes.

For your second example, the sbatch --ntasks 1 --cpus-per-task 24 [...] will allocate a job with 1 task and 24 CPUs for that task. Thus you will get a total of 24 CPUs on a single node.

In other words, a task cannot be split across multiple nodes. Therefore, using --cpus-per-task will ensure it gets allocated to the same node, while using --ntasks can and may allocate it to multiple nodes.

Another good Q&A from CÉCI's support website: Suppose you need 16 cores. Here are some use cases:

you use mpi and do not care about where those cores are distributed: --ntasks=16

you want to launch 16 independent processes (no communication): --ntasks=16

you want those cores to spread across distinct nodes: --ntasks=16 and --ntasks-per-node=1 or --ntasks=16 and --nodes=16

you want those cores to spread across distinct nodes and no interference from other jobs: --ntasks=16 --nodes=16 --exclusive

you want 16 processes to spread across 8 nodes to have two processes per node: --ntasks=16 --ntasks-per-node=2

you want 16 processes to stay on the same node: --ntasks=16 --ntasks-per-node=16

you want one process that can use 16 cores for multithreading: --ntasks=1 --cpus-per-task=16

you want 4 processes that can use 4 cores each for multithreading: --ntasks=4 --cpus-per-task=4

answered Sep 24 '22 09:09

Tung

Related questions
                            
                                why is std::lock_guard not movable?
                            
                                Want to run non-threadsafe library in parallel - can it be done using multiple classloaders?
                            
                                Do mutexes guarantee ordering of acquisition?
                            
                                Thread vs CompletableFuture
                            
                                Boost::asio - how to interrupt a blocked tcp server thread?
                            
                                OpenMP vs C++11 threads
                            
                                Python: multiprocessing.map: If one process raises an exception, why aren't other processes' finally blocks called?
                            
                                How to signal select() to return immediately?
                            
                                Java: Is `while (true) { ... }` loop in a thread bad? What's the alternative?
                            
                                C# - When to use standard threads, ThreadPool, and TPL in a high-activity server
                            
                                How does "Compare And Set" in AtomicInteger works
                            
                                Why so much difference in performance between Thread and Task?
                            
                                Does async always use another thread/core/process in C++?
                            
                                BOOST libraries in multithreading-aware mode
                            
                                How does threading in powershell work?
                            
                                What is the difference between lightweight process and thread?
                            
                                Can you use thread local variables inside a class or structure
                            
                                How Compare and Swap works
                            
                                I thought await continued on the same thread as the caller, but it seems not to
                            
                                OpenMp C++ algorithms for min, max, median, average [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With