Consider the following example where the individual jobs are independent (no synchronization needed between the threads):
#pragma omp parallel num_threads(N)
{
#pragma omp for schedule(dynamic) nowait
for (int i = 0; i < jobs; ++i)
{
...
}
}
If N = 4
and jobs = 3
I doubt there will be much of a performance hit to having the extra thread created and destroyed, but if N = 32
then I'm wondering about the impact for creating/destroying the unused threads. Is it something we should even worry about?
Thus software threads tend to evict each other's data, and the cache fighting from too many threads can hurt performance. A similar overhead, at a different level, is thrashing virtual memory. Most computers use virtual memory.
The OMP_THREAD_LIMIT environment variable sets the maximum number of OpenMP threads to use for the whole OpenMP program. The defaut number in Sun's implementation is 1024. If this environment variable is set to one, then all parallel regions will be executed by one thread.
At this point in time, the OpenMP specification doesn't give the user any ability to control when threads are destroyed. What you are saying is very interesting and hasn't been brought up during any of the OpenMP language committee meetings to discuss the specification.
First of all, the most general way to express your code is:
#pragma omp parallel for schedule(dynamic)
for (int i = 0; i < jobs; ++i)
{
}
Assume that the Implementation has a good default.
Before you go any further, measure. Sure sometimes it can be necessary to help out the implementation, but don't do that blindly. Most of the further things are implementation dependent, so looking at the standard doesn't help you a lot.
If you still manually specify the number of threads, you might as well give it std::max(N, jobs)
.
Here are some things to look out that could influence the performance in your case:
OMP_WAIT_POLICY
matters in your case as it defines how waiting threads behave. In your case excess threads will wait at the implicit barrier at the end of the parallel region. Implementations are free to do what they want with the setting, but you may assume that with active
, threads use some form of busy waiting and with passive
, threads will sleep. A busy waiting thread could use resources of the computing threads, e.g. power budget that could use used to increase turbo frequency of the computing threads. Also they waste energy. In case of oversubscription the impact of active threads is much worse.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With