Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to nest parallel loops in a sequential loop with OpenMP

I am currently working on a matrix computation with OpenMP. I have several loops in my code, and instead on calling for each loop #pragma omp parallel for[...] (which create all the threads and destroy them right after) I would like to create all of them at the beginning, and delete them at the end of the program in order to avoid overhead. I want something like :

#pragma omp parallel
{
    #pragma omp for[...]
    for(...)

    #pragma omp for[...]
    for(...)
}

The problem is that I have some parts those have to be execute by only one thread, but in a loop, which contains loops those have to be execute in parallel... This is how it looks:

//have to be execute by only one thread
int a=0,b=0,c=0;
for(a ; a<5 ; a++)
{

    //some stuff

    //loops which have to be parallelize
    #pragma omp parallel for private(b,c) schedule(static) collapse(2)
    for (b=0 ; b<8 ; b++);
        for(c=0 ; c<10 ; c++)
        {
            //some other stuff
        }

    //end of the parallel zone
    //stuff to be execute by only one thread

}

(The loop boundaries are quite small in my example. In my program the number of iterations can goes until 20.000...) One of my first idea was to do something like this:

//have to be execute by only one thread
#pragma omp parallel    //creating all the threads at the beginning
{
    #pragma omp master //or single
    {        
        int a=0,b=0,c=0;
        for(a ; a<5 ; a++)
        {

            //some stuff

            //loops which have to be parallelize
            #pragma omp for private(b,c) schedule(static) collapse(2)
            for (b=0 ; b<8 ; b++);
                for(c=0 ; c<10 ; c++)
                {
                    //some other stuff
                }

            //end of the parallel zone
            //stuff to be execute by only one thread

        }
    }
} //deleting all the threads

It doesn't compile, I get this error from gcc: "work-sharing region may not be closely nested inside of work-sharing, critical, ordered, master or explicit task region".

I know it surely comes from the "wrong" nesting, but I can't understand why it doesn't work. Do I need to add a barrier before the parallel zone ? I am a bit lost and don't know how to solve it.

Thank you in advance for your help. Cheers.

like image 911
user3014051 Avatar asked Nov 21 '13 01:11

user3014051


People also ask

Is nested parallelism possible in OpenMP?

OpenMP parallel regions can be nested inside each other. If nested parallelism is disabled, then the new team created by a thread encountering a parallel construct inside a parallel region consists only of the encountering thread. If nested parallelism is enabled, then the new team may consist of more than one thread.

How do I parallelize nested loops in OpenMP?

Parallelizing nested loops. If we have nested for loops, it is often enough to simply parallelize the outermost loop: a(); #pragma omp parallel for for (int i = 0; i < 4; ++i) { for (int j = 0; j < 4; ++j) { c(i, j); } } z();

How do you parallelize a loop using OpenMP?

The #pragma omp parallel for creates a parallel region (as described before), and to the threads of that region the iterations of the loop that it encloses will be assigned, using the default chunk size , and the default schedule which is typically static .

Can FOR loops be parallel?

Can any for loop be made parallel? No, not any loop can be made parallel. Iterations of the loop must be independent from each other. That is, one cpu core should be able to run one iteration without any side effects to another cpu core running a different iteration.


Video Answer


2 Answers

Most OpenMP runtimes don't "create all the threads and destroy them right after". The threads are created at the beginning of the first OpenMP section and destroyed when the program terminates (at least that's how Intel's OpenMP implementation does it). There's no performance advantage from using one big parallel region instead of several smaller ones.

Intel's runtimes (which is open source and can be found here) has options to control what threads do when they run out of work. By default they'll spin for a while (in case the program immediately starts a new parallel section), then they'll put themselves to sleep. If the do sleep, it will take a bit longer to start them up for the next parallel section, but this depends on the time between regions, not the syntax.

like image 111
pburka Avatar answered Oct 25 '22 21:10

pburka


In the last of your code outlines you declare a parallel region, inside that use a master directive to ensure that only the master thread executes a block, and inside the master block attempt to parallelise a loop across all threads. You claim to know that the compiler errors arise from incorrect nesting but wonder why it doesn't work.

It doesn't work because distributing work to multiple threads within a region of code which only one thread will execute doesn't make any sense.

Your first pseudo-code is better, but you probably want to extend it like this:

#pragma omp parallel
{
    #pragma omp for[...]
    for(...)

    #pragma omp single
    { ... }

    #pragma omp for[...]
    for(...)
}

The single directive ensures that the block of code it encloses is only executed by one thread. Unlike the master directive single also implies a barrier at exit; you can change this behaviour with the nowait clause.

like image 45
High Performance Mark Avatar answered Oct 25 '22 22:10

High Performance Mark