In the following example the C++11 threads take about 50 seconds to execute, but the OMP threads only 5 seconds. Any ideas why? (I can assure you it still holds true if you are doing real work instead of doNothing
, or if you do it in a different order, etc.) I'm on a 16 core machine, too.
#include <iostream> #include <omp.h> #include <chrono> #include <vector> #include <thread> using namespace std; void doNothing() {} int run(int algorithmToRun) { auto startTime = std::chrono::system_clock::now(); for(int j=1; j<100000; ++j) { if(algorithmToRun == 1) { vector<thread> threads; for(int i=0; i<16; i++) { threads.push_back(thread(doNothing)); } for(auto& thread : threads) thread.join(); } else if(algorithmToRun == 2) { #pragma omp parallel for num_threads(16) for(unsigned i=0; i<16; i++) { doNothing(); } } } auto endTime = std::chrono::system_clock::now(); std::chrono::duration<double> elapsed_seconds = endTime - startTime; return elapsed_seconds.count(); } int main() { int cppt = run(1); int ompt = run(2); cout<<cppt<<endl; cout<<ompt<<endl; return 0; }
OpenMP threads cannot be killed forcefully from the outside. They do not have a handle that you can use to perform operations like join, interrupt, abort, etc. In fact, OpenMP is not even designed for this.
When run, an OpenMP program will use one thread (in the sequential sections), and several threads (in the parallel sections). There is one thread that runs from the beginning to the end, and it's called the master thread. The parallel sections of the program will cause additional threads to fork.
The OpenMP standard was formulated in 1997 as an API for writing portable, multithreaded applications. It started as a Fortran-based standard, but later grew to include C and C++. The current version is OpenMP 2.0, and Visual C++® 2005 supports the full standard.
What is OpenMP. OpenMP is an implementation of multithreading, a method of parallelizing whereby a master thread (a series of instructions executed consecutively) forks a specified number of slave threads and the system divides a task among them.
OpenMP thread-pools for its Pragmas (also here and here). Spinning up and tearing down threads is expensive. OpenMP avoids this overhead, so all it's doing is the actual work and the minimal shared-memory shuttling of the execution state. In your Threads
code you are spinning up and tearing down a new set of 16 threads every iteration.
I tried a code of an 100 looping at Choosing the right threading framework and it took OpenMP 0.0727, Intel TBB 0.6759 and C++ thread library 0.5962 mili-seconds.
I also applied what AruisDante suggested;
void nested_loop(int max_i, int band) { for (int i = 0; i < max_i; i++) { doNothing(band); } } ... else if (algorithmToRun == 5) { thread bristle(nested_loop, max_i, band); bristle.join(); }
This code looks like taking less time than your original C++ 11 thread section.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With