Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is OpenMP outperforming threads?

I've been calling this in OpenMP

#pragma omp parallel for num_threads(totalThreads)
for(unsigned i=0; i<totalThreads; i++)
{
workOnTheseEdges(startIndex[i], endIndex[i]);
}

And this in C++11 std::threads (I believe those are just pthreads)

vector<thread> threads;
for(unsigned i=0; i<totalThreads; i++)
{
threads.push_back(thread(workOnTheseEdges,startIndex[i], endIndex[i])); 
}
for (auto& thread : threads)
{
 thread.join();
}

But, the OpenMP implementation is 2x the speed--Faster! I would have expected C++11 threads to be faster, as they are more low-level. Note: The code above is being called not just once, but probably 10,000 times in a loop, so maybe that has something to do with it?

Edit: for clarification, in practice, I either use the OpenMP or the C++11 version--not both. When I am using the OpenMP code, it takes 45 seconds and when I am using the the C++11, it takes 100 seconds.

like image 887
user2588666 Avatar asked Mar 19 '23 19:03

user2588666


2 Answers

Where does totalThreads come from in your OpenMP version? I bet it's not startIndex.size().

The OpenMP version queues the requests onto totalThreads worker threads. It looks like the C++11 version creates, startIndex.size() threads, which involves a ridiculous amount of overhead if that's a big number.

like image 135
adpalumbo Avatar answered Apr 01 '23 02:04

adpalumbo


Consider the following code. The OpenMP version runs in 0 seconds while the C++11 version runs in 50 seconds. This is not due to the function being doNothing, and it's not due to vector being within the loop. As you can imagine, the c++11 threads are created and then destroyed in each iteration. On the other hand, OpenMP actually implements threadpools. It's not in the standard, but it's in Intel's and AMD's implementations.

for(int j=1; j<100000; ++j)
{
    if(algorithmToRun == 1)
    {
        vector<thread> threads;
        for(int i=0; i<16; i++)
        {
            threads.push_back(thread(doNothing));
        }
        for(auto& thread : threads) thread.join();
    }
    else if(algorithmToRun == 2)
    {
        #pragma omp parallel for num_threads(16)
        for(unsigned i=0; i<16; i++)
        {
            doNothing();
        }
    }
}
like image 44
user2588666 Avatar answered Apr 01 '23 01:04

user2588666