Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OpenMP Parallel Sections Within For Loop (C++) - Overhead

I have been working on a quantum simulation. Each time step a potential function is calculated, one step of the solver is iterated, and then a series of measurements are conducted. These three processes are easily parallelizable, and I've already made sure they don't interfere with each other. Additionally there is some stuff that is fairly simple, but should not be done in parallel. An outline of the setup is shown below.

omp_set_num_threads(3);
#pragma omp parallel
{
    while (notDone) {
        #pragma omp sections
        {
            #pragma omp section
            {
                createPotential();
            }
            #pragma omp section
            {
                iterateWaveFunction();
            }
            #pragma omp section
            {
                takeMeasurements();
            }
        }
        #pragma omp single
        {
            doSimpleThings();
        }
    }
}

The code works just fine! I see a speed increase, mostly associated with the measurements running alongside the TDSE solver (about 30% speed increase). However, the program goes from using about 10% CPU (about one thread) to 35% (about three threads). This would make sense if the potential function, TDSE iterator, and measurements took equally as long, but they do not. Based on the speed increase, I would expect something on the order of 15% CPU usage.

I have a feeling this has to do with the overhead of running these three threads within the while loop. Replacing

#pragma omp sections

with

#pragma omp parallel sections

(and omitting the two lines just before the loop) changes nothing. Is there a more efficient way to run this setup? I'm not sure if the threads are constantly being recreated, or if the thread is holding up an entire core while it waits for the others to be done. If I increase the number of threads from 3 to any other number, the program uses as much resources as it wants (which could be all of the CPU) and gets no performance gain.

like image 820
Joshua M Avatar asked Nov 07 '22 03:11

Joshua M


1 Answers

I've tried many options, including using tasks instead of sections (with the same results), switching compilers, etc. As suggested by Qubit, I also tried to use std::async. This was the solution! The CPU usage dropped from about 50% to 30% (this is on a different computer from the original post, so the numbers are different -- it's a 1.5x performance gain for 1.6x CPU usage basically). This is much closer to what I expected for this computer.

For reference, here is the new code outline:

void SimulationManager::runParallel(){
    auto rV = &SimulationManager::createPotential();
    auto rS = &SimulationManager::iterateWaveFunction();
    auto rM = &SimulationManager::takeMeasurements();
    std::future<int> f1, f2, f3;
    while(notDone){
        f1 = std::async(rV, this);
        f2 = std::async(rS, this);
        f3 = std::async(rM, this);
        f1.get(); f2.get(); f3.get();
        doSimpleThings();
    }
}

The three original functions are called using std::async, and then I use the future variables f1, f2, and f3 to collect everything back to a single thread and avoid access issues.

like image 66
Joshua M Avatar answered Nov 10 '22 00:11

Joshua M