I want to bind the threads in my code to each physical core.
With GCC I have successfully done this using sched_setaffinity
so I no longer have to set export OMP_PROC_BIND=true
. I want to do the same thing in Windows with MSVC. Windows and Linux using a different thread topology. Linux scatters the threads while windows uses a compact form. In other words in Linux with four cores and eight hyper-threads I only need to bind the threads to the first four processing units. In windows I set them to to every other processing unit.
I have successfully done this using SetProcessAffinityMask
. I can see from Windows Task Manger when I right click on the processes and click "Set Affinity" that every other CPU is set (0, 2, 4, 6 on my eight hyper thread system). The problem is that the efficiency of my code is unstable when I run. Sometimes it's nearly constant but most of the time it has big changes. I changed the priority to high but it makes no difference. In Linux the efficiency is stable. Maybe Windows is still migrating the threads? Is there something else I need to do to bind the threads in Windows?
Here is the code I'm using
#ifdef _WIN32
HANDLE process;
DWORD_PTR processAffinityMask = 0;
//Windows uses a compact thread topology. Set mask to every other thread
for(int i=0; i<ncores; i++) processAffinityMask |= 1<<(2*i);
//processAffinityMask = 0x55;
process = GetCurrentProcess();
SetProcessAffinityMask(process, processAffinityMask);
#else
cpu_set_t mask;
CPU_ZERO(&mask);
for(int i=0; i<ncores; i++) CPU_SET(i, &mask);
sched_setaffinity(0, sizeof(mask), &mask);
#endif
Edit: here is the code I used now which seems to be stable on Linux and Windows
#ifdef _WIN32
HANDLE process;
DWORD_PTR processAffinityMask;
//Windows uses a compact thread topology. Set mask to every other thread
for(int i=0; i<ncores; i++) processAffinityMask |= 1<<(2*i);
process = GetCurrentProcess();
SetProcessAffinityMask(process, processAffinityMask);
#pragma omp parallel
{
HANDLE thread = GetCurrentThread();
DWORD_PTR threadAffinityMask = 1<<(2*omp_get_thread_num());
SetThreadAffinityMask(thread, threadAffinityMask);
}
#else
cpu_set_t mask;
CPU_ZERO(&mask);
for(int i=0; i<ncores; i++) CPU_SET(i, &mask);
sched_setaffinity(0, sizeof(mask), &mask);
#pragma omp parallel
{
cpu_set_t mask;
CPU_ZERO(&mask);
CPU_SET(omp_get_thread_num(),&mask);
pthread_setaffinity_np(pthread_self(), sizeof(mask), &mask);
}
#endif
You should use the SetThreadAffinityMask
function (see MSDN reference). You are setting the process's mask.
You can obtain a thread ID
in OpenMP with this code:
int tid = omp_get_thread_num();
However the code above provides OpenMP's internal thread ID
, and not the system thread ID
. This article explains more on the subject:
http://msdn.microsoft.com/en-us/magazine/cc163717.aspx
if you need to explicitly work with those trheads - use the explicit affinity type
as explained in this Intel documentation:
https://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/optaps/common/optaps_openmp_thread_affinity.htm
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With