In the following example the C++11 threads take about 50 seconds to execute, but the OMP threads only 5 seconds. Any ideas why? (I can assure you it still holds true if you are doing real work instead of <code>doNothing</code>, or if you do it in a different order, etc.) I'm on a 16 core machine, too. <pre class="prettyprint"><code>#include <iostream> #include <omp.h> #include <chrono> #include <vector> #include <thread> using namespace std; void doNothing() {} int run(int algorithmToRun) { auto startTime = std::chrono::system_clock::now(); for(int j=1; j<100000; ++j) { if(algorithmToRun == 1) { vector<thread> threads; for(int i=0; i<16; i++) { threads.push_back(thread(doNothing)); } for(auto& thread : threads) thread.join(); } else if(algorithmToRun == 2) { #pragma omp parallel for num_threads(16) for(unsigned i=0; i<16; i++) { doNothing(); } } } auto endTime = std::chrono::system_clock::now(); std::chrono::duration<double> elapsed_seconds = endTime - startTime; return elapsed_seconds.count(); } int main() { int cppt = run(1); int ompt = run(2); cout<<cppt<<endl; cout<<ompt<<endl; return 0; } </code></pre>

I tried a code of an 100 looping at Choosing the right threading framework and it took OpenMP 0.0727, Intel TBB 0.6759 and C++ thread library 0.5962 mili-seconds. I also applied what AruisDante suggested; <pre class="prettyprint"><code>void nested_loop(int max_i, int band) { for (int i = 0; i < max_i; i++) { doNothing(band); } } ... else if (algorithmToRun == 5) { thread bristle(nested_loop, max_i, band); bristle.join(); } </code></pre> This code looks like taking less time than your original C++ 11 thread section.

OpenMP vs C++11 threads

Tags:

c++

multithreading

c++11

In the following example the C++11 threads take about 50 seconds to execute, but the OMP threads only 5 seconds. Any ideas why? (I can assure you it still holds true if you are doing real work instead of doNothing, or if you do it in a different order, etc.) I'm on a 16 core machine, too.

#include <iostream> #include <omp.h> #include <chrono> #include <vector> #include <thread>  using namespace std;  void doNothing() {}  int run(int algorithmToRun) {     auto startTime = std::chrono::system_clock::now();      for(int j=1; j<100000; ++j)     {         if(algorithmToRun == 1)         {             vector<thread> threads;             for(int i=0; i<16; i++)             {                 threads.push_back(thread(doNothing));             }             for(auto& thread : threads) thread.join();         }         else if(algorithmToRun == 2)         {             #pragma omp parallel for num_threads(16)             for(unsigned i=0; i<16; i++)             {                 doNothing();             }         }     }      auto endTime = std::chrono::system_clock::now();     std::chrono::duration<double> elapsed_seconds = endTime - startTime;      return elapsed_seconds.count(); }  int main() {     int cppt = run(1);     int ompt = run(2);      cout<<cppt<<endl;     cout<<ompt<<endl;      return 0; }

442

asked Apr 24 '14 01:04

user2588666

2 Answers

OpenMP thread-pools for its Pragmas (also here and here). Spinning up and tearing down threads is expensive. OpenMP avoids this overhead, so all it's doing is the actual work and the minimal shared-memory shuttling of the execution state. In your Threads code you are spinning up and tearing down a new set of 16 threads every iteration.

194

answered Oct 11 '22 14:10

aruisdante

I tried a code of an 100 looping at Choosing the right threading framework and it took OpenMP 0.0727, Intel TBB 0.6759 and C++ thread library 0.5962 mili-seconds.

I also applied what AruisDante suggested;

void nested_loop(int max_i, int band)   {     for (int i = 0; i < max_i; i++)     {         doNothing(band);     } } ... else if (algorithmToRun == 5) {     thread bristle(nested_loop, max_i, band);     bristle.join(); }

This code looks like taking less time than your original C++ 11 thread section.

answered Oct 11 '22 14:10

Cloud Cho

Related questions
                            
                                Why does C++ linking use virtually no CPU?
                            
                                C++ nested classes accessibility
                            
                                Default initialization of C++ Member arrays?
                            
                                best way to do variant visitation with lambdas
                            
                                Qt foreach loop ordering vs. for loop for QList
                            
                                why is std::lock_guard not movable?
                            
                                Qt - add a hyperlink to a dialog
                            
                                Why define operator + or += outside a class, and how to do it properly?
                            
                                Simple object detection using OpenCV and machine learning
                            
                                Creating new types in C++
                            
                                How do I invoke the MinGW cross-compiler on Linux?
                            
                                Using std::tie as a range for loop target
                            
                                What are _mm_prefetch() locality hints?
                            
                                How can you detect if two regular expressions overlap in the strings they can match?
                            
                                How can i use tesseract ocr(or any other free ocr) in small c++ project?
                            
                                Should I use the same name for a member variable and a function parameter in C++?
                            
                                Boost::asio - how to interrupt a blocked tcp server thread?
                            
                                Are there any disadvantages to "multi-processor compilation" in Visual Studio?
                            
                                Newton Raphson with SSE2 - can someone explain me these 3 lines
                            
                                Correct use of std::cout.precision() - not printing trailing zeros

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With