Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ensure hybrid MPI / OpenMP runs each OpenMP thread on a different core

I am trying to get a hybrid OpenMP / MPI job to run so that OpenMP threads are separated by core (only one thread per core). I have seen other answers which use numa-ctl and bash scripts to set environment variables, and I don't want to do this.

I would like to be able to do this only by setting OMP_NUM_THREADS and or OMP_PROC_BIND and mpiexec options on the command line. I have tried the following - let's say I want 2 MPI processes that each have 2 OpenMP threads, and each of the threads are run on separate cores, so I want 4 cores total.

OMP_PROC_BIND=true OMP_PLACES=cores OMP_NUM_THREADS=2 mpiexec -n 2 

This splits the jobs so that only two processes are at work, and they are all on the same CPU, so they are each only using about 25% of the CPU. If I try:

OMP_PROC_BIND=false OMP_PLACES=cores OMP_NUM_THREADS=2 mpiexec -n 2

then, I just get two separate MPI processes, each running at 100% or over 100% of their CPU power, according to top. This doesn't seem to show different cores being used for OpenMP threads.

How do I force the system to put separate threads on separate cores?

FYI, lscpu prints this:

-CPU(s):                48
-On-line CPU(s) list:   0-47
-Thread(s) per core:    2
-Core(s) per socket:    12
-Socket(s):             2
-NUMA node(s):          2
like image 416
v2v1 Avatar asked Dec 14 '17 20:12

v2v1


People also ask

Can OpenMP and MPI be used together?

MPI and OpenMP can be used at the same time to create a Hybrid MPI/OpenMP program.

Is MPI slower than OpenMP?

openMP is 0.5% faster than MPI for this instance. The conclusion: openMP and MPI are virtually equally efficient in running threads with identical computational load.

What is OMP and MPI?

• OpenMP (shared memory) – Parallel programming on a single node. • MPI (distributed memory) – Parallel computing running on multiple nodes.

What is Omp_num_threads?

OMP_NUM_THREADS. Sets the maximum number of threads in the parallel region, unless overridden by omp_set_num_threads or num_threads. OMP_DYNAMIC. Specifies whether the OpenMP run time can adjust the number of threads in a parallel region.


1 Answers

Actually, I'd expect your first example to work. Setting the OMP_PROC_BIND=true here is important, so that OpenMP stays within the CPU binding from the MPI process when pinning it's threads.

Depending on the batch system and MPI implementation, there might be very individual ways to set these things up.

Also Hyperthreading, or in general multiple hardware threads per core, that all show up as "cores" in your Linux, might be part of the problem as you'll never see 200% when two processes run on the two Hyperthreads of one cores.

Here is a generic solution, I use when figuring these things for some MPI and some OpenMP implementation on some system. There's documentation from Cray which contains a very helpful program to figure these things out quickly, it's called xthi.c, google the filename or paste it from here (not sure if it's legal to paste it here...). Compile with:

mpicc xthi.c -fopenmp -o xthi

Now we can see what exactly is going on, for instance on a 2x 8 Core Xeon with Hyperthreading and Intel MPI (MPICH-based) we get:

$ OMP_PROC_BIND=true OMP_PLACES=cores OMP_NUM_THREADS=2 mpiexec -n 2 ./xthi

Hello from rank 0, thread 0, on localhost. (core affinity = 0,16)
Hello from rank 0, thread 1, on localhost. (core affinity = 1,17)
Hello from rank 1, thread 0, on localhost. (core affinity = 8,24)
Hello from rank 1, thread 1, on localhost. (core affinity = 9,25)

As you can see, core means, all the Hyperthreads of a core. Note how mpirun pins it different sockets, too by default. And With OMP_PLACES=threads you get one thread per core:

$ OMP_PROC_BIND=true OMP_PLACES=threads OMP_NUM_THREADS=2 mpiexec -n 2 ./xthi
Hello from rank 0, thread 0, on localhost. (core affinity = 0)
Hello from rank 0, thread 1, on localhost. (core affinity = 1)
Hello from rank 1, thread 0, on localhost. (core affinity = 8)
Hello from rank 1, thread 1, on localhost. (core affinity = 9)

With OMP_PROC_BIND=false (your second example), I get:

$ OMP_PROC_BIND=false OMP_PLACES=cores OMP_NUM_THREADS=2 mpiexec -n 2 ./xthi
Hello from rank 0, thread 0, on localhost. (core affinity = 0-7,16-23)
Hello from rank 0, thread 1, on localhost. (core affinity = 0-7,16-23)
Hello from rank 1, thread 0, on localhost. (core affinity = 8-15,24-31)
Hello from rank 1, thread 1, on localhost. (core affinity = 8-15,24-31)

Here, each OpenMP thread gets a full socket, so the MPI ranks still operate on distinct resources. However, the OpenMP threads, within one process could be scheduled wildly by the OS across all cores. It's the same as just setting OMP_NUM_THREADS=2 on my test system.

Again, this might depend on specific OpenMP and MPI implementations and versions, but I think you'll easily figure out what's going on with the description above.

Hope that helps.

like image 61
noma Avatar answered Oct 02 '22 22:10

noma