Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the performance impact of having more OpenMP threads than work?

Consider the following example where the individual jobs are independent (no synchronization needed between the threads):

#pragma omp parallel num_threads(N)
{
    #pragma omp for schedule(dynamic) nowait
    for (int i = 0; i < jobs; ++i)
    {
        ...
    }
}

If N = 4 and jobs = 3 I doubt there will be much of a performance hit to having the extra thread created and destroyed, but if N = 32 then I'm wondering about the impact for creating/destroying the unused threads. Is it something we should even worry about?

like image 742
RyGuyinCA Avatar asked Oct 14 '15 15:10

RyGuyinCA


People also ask

What happens if you have too many threads?

Thus software threads tend to evict each other's data, and the cache fighting from too many threads can hurt performance. A similar overhead, at a different level, is thrashing virtual memory. Most computers use virtual memory.

What is the maximum number of threads in OpenMP?

The OMP_THREAD_LIMIT environment variable sets the maximum number of OpenMP threads to use for the whole OpenMP program. The defaut number in Sun's implementation is 1024. If this environment variable is set to one, then all parallel regions will be executed by one thread.

Does OpenMP destroy threads?

At this point in time, the OpenMP specification doesn't give the user any ability to control when threads are destroyed. What you are saying is very interesting and hasn't been brought up during any of the OpenMP language committee meetings to discuss the specification.


1 Answers

First of all, the most general way to express your code is:

#pragma omp parallel for schedule(dynamic)
for (int i = 0; i < jobs; ++i)
{
}

Assume that the Implementation has a good default.

Before you go any further, measure. Sure sometimes it can be necessary to help out the implementation, but don't do that blindly. Most of the further things are implementation dependent, so looking at the standard doesn't help you a lot.

If you still manually specify the number of threads, you might as well give it std::max(N, jobs).

Here are some things to look out that could influence the performance in your case:

  • Don't worry too much about overhead of spawning unnecessary threads. Implementations mitigate that by thread pools. That doesn't mean it's always perfect - so measure.
  • Do not oversubscribe unless you know what your are doing. Use at most number of cores threads. This is a general advice.
  • The OMP_WAIT_POLICY matters in your case as it defines how waiting threads behave. In your case excess threads will wait at the implicit barrier at the end of the parallel region. Implementations are free to do what they want with the setting, but you may assume that with active, threads use some form of busy waiting and with passive, threads will sleep. A busy waiting thread could use resources of the computing threads, e.g. power budget that could use used to increase turbo frequency of the computing threads. Also they waste energy. In case of oversubscription the impact of active threads is much worse.
like image 173
Zulan Avatar answered Sep 29 '22 00:09

Zulan