I have a machine with 8 processors. I want to alternate using OpenMP and MPI on my code like this:
OpenMP phase:
MPI phase:
So far, I've done:
It all worked. Rank 0 did launch 8 threads, but all are confined to one processor. On the OpenMP phase I get 8 threads from rank 0 running on one processor and all other processors are idle.
How do I tell MPI to allow rank 0 to use the other processors? I am using Intel MPI, but could switch to OpenMPI or MPICH if needed.
MPI+OpenMP hybrid paradigm is the trend for clusters with SMP architecture. Elegant in concept: use OpenMP within the node and MPI between nodes, in order to have a good use of shared resources. Avoid additional communication within the MPI node. OpenMP introduces fine-granularity.
MPI and OpenMP can be used at the same time to create a Hybrid MPI/OpenMP program.
Quite often the question arises as to which is faster or more efficient in terms of reducing the processing time for an algorithm. The short answer to this is that mpi and openMP, when run with their most basic requirements, are equally efficient at reducing the processing time of a simple computational load.
The following code shows an example on how to save the CPU affinity mask before the OpenMP part, alter it to allow all CPUs for the duration of the parallel region and then restore the previous CPU affinity mask. The code is Linux specific and it makes no sense if you do not enable process pinning by the MPI library - activated by passing --bind-to-core
or --bind-to-socket
to mpiexec
in Open MPI; deactivated by setting I_MPI_PIN
to disable
in Intel MPI (the default on 4.x is to pin processes).
#define _GNU_SOURCE
#include <sched.h>
...
cpu_set_t *oldmask, *mask;
size_t size;
int nrcpus = 256; // 256 cores should be more than enough
int i;
// Save the old affinity mask
oldmask = CPU_ALLOC(nrcpus);
size = CPU_ALLOC_SIZE(nrcpus);
CPU_ZERO_S(size, oldmask);
if (sched_getaffinity(0, size, oldmask) == -1) { error }
// Temporary allow running on all processors
mask = CPU_ALLOC(nrcpus);
for (i = 0; i < nrcpus; i++)
CPU_SET_S(i, size, mask);
if (sched_setaffinity(0, size, mask) == -1) { error }
#pragma omp parallel
{
}
CPU_FREE(mask);
// Restore the saved affinity mask
if (sched_setaffinity(0, size, oldmask) == -1) { error }
CPU_FREE(oldmask);
...
You can also tweak the pinning arguments of the OpenMP run-time. For GCC/libgomp
the affinity is controlled by the GOMP_CPU_AFFINITY environment variable, while for Intel compilers it is KMP_AFFINITY. You can still use the code above if the OpenMP run-time intersects the supplied affinity mask with that of the process.
Just for the sake of completeness - saving, setting and restoring the affinity mask on Windows:
#include <windows.h>
...
HANDLE hCurrentProc, hDupCurrentProc;
DWORD_PTR dwpSysAffinityMask, dwpProcAffinityMask;
// Obtain a usable handle of the current process
hCurrentProc = GetCurrentProcess();
DuplicateHandle(hCurrentProc, hCurrentProc, hCurrentProc,
&hDupCurrentProc, 0, FALSE, DUPLICATE_SAME_ACCESS);
// Get the old affinity mask
GetProcessAffinityMask(hDupCurrentProc,
&dwpProcAffinityMask, &dwpSysAffinityMask);
// Temporary allow running on all CPUs in the system affinity mask
SetProcessAffinityMask(hDupCurrentProc, &dwpSysAffinityMask);
#pragma omp parallel
{
}
// Restore the old affinity mask
SetProcessAffinityMask(hDupCurrentProc, &dwpProcAffinityMask);
CloseHandle(hDupCurrentProc);
...
Should work with a single processor group (up to 64 logical processors).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With