Consider the following example where the individual jobs are independent (no synchronization needed between the threads): <pre class="prettyprint"><code>#pragma omp parallel num_threads(N) { #pragma omp for schedule(dynamic) nowait for (int i = 0; i < jobs; ++i) { ... } } </code></pre> If <code>N = 4</code> and <code>jobs = 3</code> I doubt there will be much of a performance hit to having the extra thread created and destroyed, but if <code>N = 32</code> then I'm wondering about the impact for creating/destroying the unused threads. Is it something we should even worry about?

First of all, the most general way to express your code is: <pre class="prettyprint"><code>#pragma omp parallel for schedule(dynamic) for (int i = 0; i < jobs; ++i) { } </code></pre> Assume that the Implementation has a good default. Before you go any further, measure. Sure sometimes it can be necessary to help out the implementation, but don't do that blindly. Most of the further things are implementation dependent, so looking at the standard doesn't help you a lot. If you still manually specify the number of threads, you might as well give it <code>std::max(N, jobs)</code>. Here are some things to look out that could influence the performance in your case: <ul> <li>Don't worry too much about overhead of spawning unnecessary threads. Implementations mitigate that by thread pools. That doesn't mean it's always perfect - so measure.</li> <li>Do not oversubscribe unless you know what your are doing. Use at most number of cores threads. This is a general advice.</li> <li>The <code>OMP_WAIT_POLICY</code> matters in your case as it defines how waiting threads behave. In your case excess threads will wait at the implicit barrier at the end of the parallel region. Implementations are free to do what they want with the setting, but you may assume that with <code>active</code>, threads use some form of busy waiting and with <code>passive</code>, threads will sleep. A busy waiting thread could use resources of the computing threads, e.g. power budget that could use used to increase turbo frequency of the computing threads. Also they waste energy. In case of oversubscription the impact of active threads is much worse.</li> </ul>

What is the performance impact of having more OpenMP threads than work?

Tags:

c++

multithreading

openmp

Consider the following example where the individual jobs are independent (no synchronization needed between the threads):

#pragma omp parallel num_threads(N)
{
    #pragma omp for schedule(dynamic) nowait
    for (int i = 0; i < jobs; ++i)
    {
        ...
    }
}

If N = 4 and jobs = 3 I doubt there will be much of a performance hit to having the extra thread created and destroyed, but if N = 32 then I'm wondering about the impact for creating/destroying the unused threads. Is it something we should even worry about?

742

asked Oct 14 '15 15:10

RyGuyinCA

1 Answers

First of all, the most general way to express your code is:

#pragma omp parallel for schedule(dynamic)
for (int i = 0; i < jobs; ++i)
{
}

Assume that the Implementation has a good default.

Before you go any further, measure. Sure sometimes it can be necessary to help out the implementation, but don't do that blindly. Most of the further things are implementation dependent, so looking at the standard doesn't help you a lot.

If you still manually specify the number of threads, you might as well give it std::max(N, jobs).

Here are some things to look out that could influence the performance in your case:

Don't worry too much about overhead of spawning unnecessary threads. Implementations mitigate that by thread pools. That doesn't mean it's always perfect - so measure.
Do not oversubscribe unless you know what your are doing. Use at most number of cores threads. This is a general advice.
The OMP_WAIT_POLICY matters in your case as it defines how waiting threads behave. In your case excess threads will wait at the implicit barrier at the end of the parallel region. Implementations are free to do what they want with the setting, but you may assume that with active, threads use some form of busy waiting and with passive, threads will sleep. A busy waiting thread could use resources of the computing threads, e.g. power budget that could use used to increase turbo frequency of the computing threads. Also they waste energy. In case of oversubscription the impact of active threads is much worse.

173

answered Sep 29 '22 00:09

Zulan

Related questions
                            
                                Is this a bug in Visual C++ 2010, or am I missing something?
                            
                                Cannot Debug Shared Library - Symbols Not Loading Properly
                            
                                Are there any side-effects of using macro _BIND_TO_CURRENT_VCLIBS_VERSION?
                            
                                Is there a Windows API to detect USB overcurrent?
                            
                                Benefits of compiling C code with gcc's C++ front-end
                            
                                Perfect filled triangle rendering algorithm?
                            
                                Is it recommended to std::move a string into containers that is going to be overwritten?
                            
                                Closest integer to floating-point value in C++03
                            
                                gcc c++11 limits for user defined constants and template parameter packs
                            
                                Boost parse_config_file, empty key value
                            
                                How to understand the tricky speed up
                            
                                GCC Plugin, add new optimizing pragma
                            
                                Loop Around File Mapping Kills Performance
                            
                                C++ using declaration with typename in inheriting-constructors
                            
                                How to initialize array of classes with deleted copy constructor (C++11)
                            
                                Maximize profit in scheduling unit tasks with dependencies
                            
                                equivalence to decltype(*this) from a static method?
                            
                                String Tokenizer with multiple delimiters including delimiter without Boost
                            
                                C++ futures parallel processing
                            
                                How to build a mixed (C++ + C#) solution with Travis CI?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With