I have a machine with 8 processors. I want to alternate using OpenMP and MPI on my code like this: OpenMP phase: <ul> <li>ranks 1-7 wait on a MPI_Barrier</li> <li>rank 0 uses all 8 processors with OpenMP</li> </ul> MPI phase: <ul> <li>rank 0 reaches barrier and all ranks use one processor each</li> </ul> So far, I've done: <ul> <li>set I_MPI_WAIT_MODE 1 so that ranks 1-7 don't use the CPU while on the barrier.</li> <li>set omp_set_num_threads(8) on rank 0 so that it launches 8 OpenMP threads.</li> </ul> It all worked. Rank 0 did launch 8 threads, but all are confined to one processor. On the OpenMP phase I get 8 threads from rank 0 running on one processor and all other processors are idle. How do I tell MPI to allow rank 0 to use the other processors? I am using Intel MPI, but could switch to OpenMPI or MPICH if needed.

The following code shows an example on how to save the CPU affinity mask before the OpenMP part, alter it to allow all CPUs for the duration of the parallel region and then restore the previous CPU affinity mask. The code is Linux specific and it makes no sense if you do not enable process pinning by the MPI library - activated by passing <code>--bind-to-core</code> or <code>--bind-to-socket</code> to <code>mpiexec</code> in Open MPI; deactivated by setting <code>I_MPI_PIN</code> to <code>disable</code> in Intel MPI (the default on 4.x is to pin processes). <pre class="prettyprint lang-c prettyprint-override"><code>#define _GNU_SOURCE #include <sched.h> ... cpu_set_t *oldmask, *mask; size_t size; int nrcpus = 256; // 256 cores should be more than enough int i; // Save the old affinity mask oldmask = CPU_ALLOC(nrcpus); size = CPU_ALLOC_SIZE(nrcpus); CPU_ZERO_S(size, oldmask); if (sched_getaffinity(0, size, oldmask) == -1) { error } // Temporary allow running on all processors mask = CPU_ALLOC(nrcpus); for (i = 0; i < nrcpus; i++) CPU_SET_S(i, size, mask); if (sched_setaffinity(0, size, mask) == -1) { error } #pragma omp parallel { } CPU_FREE(mask); // Restore the saved affinity mask if (sched_setaffinity(0, size, oldmask) == -1) { error } CPU_FREE(oldmask); ... </code></pre> You can also tweak the pinning arguments of the OpenMP run-time. For <code>GCC/libgomp</code> the affinity is controlled by the GOMP_CPU_AFFINITY environment variable, while for Intel compilers it is KMP_AFFINITY. You can still use the code above if the OpenMP run-time intersects the supplied affinity mask with that of the process. Just for the sake of completeness - saving, setting and restoring the affinity mask on Windows: <pre class="prettyprint lang-c prettyprint-override"><code>#include <windows.h> ... HANDLE hCurrentProc, hDupCurrentProc; DWORD_PTR dwpSysAffinityMask, dwpProcAffinityMask; // Obtain a usable handle of the current process hCurrentProc = GetCurrentProcess(); DuplicateHandle(hCurrentProc, hCurrentProc, hCurrentProc, &hDupCurrentProc, 0, FALSE, DUPLICATE_SAME_ACCESS); // Get the old affinity mask GetProcessAffinityMask(hDupCurrentProc, &dwpProcAffinityMask, &dwpSysAffinityMask); // Temporary allow running on all CPUs in the system affinity mask SetProcessAffinityMask(hDupCurrentProc, &dwpSysAffinityMask); #pragma omp parallel { } // Restore the old affinity mask SetProcessAffinityMask(hDupCurrentProc, &dwpProcAffinityMask); CloseHandle(hDupCurrentProc); ... </code></pre> Should work with a single processor group (up to 64 logical processors).

OpenMP and MPI hybrid program

1 Answers

The following code shows an example on how to save the CPU affinity mask before the OpenMP part, alter it to allow all CPUs for the duration of the parallel region and then restore the previous CPU affinity mask. The code is Linux specific and it makes no sense if you do not enable process pinning by the MPI library - activated by passing --bind-to-core or --bind-to-socket to mpiexec in Open MPI; deactivated by setting I_MPI_PIN to disable in Intel MPI (the default on 4.x is to pin processes).

#define _GNU_SOURCE

#include <sched.h>

...

cpu_set_t *oldmask, *mask;
size_t size;
int nrcpus = 256; // 256 cores should be more than enough
int i;

// Save the old affinity mask
oldmask = CPU_ALLOC(nrcpus);
size = CPU_ALLOC_SIZE(nrcpus);
CPU_ZERO_S(size, oldmask);
if (sched_getaffinity(0, size, oldmask) == -1) { error }

// Temporary allow running on all processors
mask = CPU_ALLOC(nrcpus);
for (i = 0; i < nrcpus; i++)
   CPU_SET_S(i, size, mask);
if (sched_setaffinity(0, size, mask) == -1) { error }

#pragma omp parallel
{
}

CPU_FREE(mask);

// Restore the saved affinity mask
if (sched_setaffinity(0, size, oldmask) == -1) { error }

CPU_FREE(oldmask);

...

You can also tweak the pinning arguments of the OpenMP run-time. For GCC/libgomp the affinity is controlled by the GOMP_CPU_AFFINITY environment variable, while for Intel compilers it is KMP_AFFINITY. You can still use the code above if the OpenMP run-time intersects the supplied affinity mask with that of the process.

Just for the sake of completeness - saving, setting and restoring the affinity mask on Windows:

#include <windows.h>

...

HANDLE hCurrentProc, hDupCurrentProc;
DWORD_PTR dwpSysAffinityMask, dwpProcAffinityMask;

// Obtain a usable handle of the current process
hCurrentProc = GetCurrentProcess();
DuplicateHandle(hCurrentProc, hCurrentProc, hCurrentProc,
                &hDupCurrentProc, 0, FALSE, DUPLICATE_SAME_ACCESS);

// Get the old affinity mask
GetProcessAffinityMask(hDupCurrentProc,
                       &dwpProcAffinityMask, &dwpSysAffinityMask);

// Temporary allow running on all CPUs in the system affinity mask
SetProcessAffinityMask(hDupCurrentProc, &dwpSysAffinityMask);

#pragma omp parallel
{
}

// Restore the old affinity mask
SetProcessAffinityMask(hDupCurrentProc, &dwpProcAffinityMask);

CloseHandle(hDupCurrentProc);

...

Should work with a single processor group (up to 64 logical processors).

102

answered Sep 29 '22 20:09

Hristo Iliev

Related questions
                            
                                Vector Usage in MPI(C++)
                            
                                Fortran error: type mismatch between two unrelated subroutine calls
                            
                                Processor/socket affinity in openMPI?
                            
                                What is the advantage (if any) of MPI + threads parallelization vs. MPI-only?
                            
                                MPI_Recv - How to determine count?
                            
                                How is barrier implemented in message passing systems?
                            
                                MPI implementation for Java
                            
                                Why would my parallel code be slower than my serial code?
                            
                                Do I need to have a corresponding MPI::Irecv for an MPI::Isend?
                            
                                mpi4py Send/Recv with tag
                            
                                MPI: Change number of processors in CMakelists
                            
                                OpenMPI MPI_Barrier problems
                            
                                How to compile an MPI included c program using cmake
                            
                                Speed up processing from CSV file
                            
                                MPI and D: Linker Options
                            
                                TensorFlow Horovod: NCCL and MPI
                            
                                MPI + GPU : how to mix the two techniques
                            
                                How to Send/Receive in MPI using all processors
                            
                                What is the displs argument in MPI_Scatterv?
                            
                                Trying to get started with doParallel and foreach but no improvement

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

OpenMP and MPI hybrid program

Tags:

mpi

openmp

Italo

People also ask

1 Answers

Hristo Iliev

Recent Activity

Donate For Us