Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OpenMP iteration for loop in parallel region

Sorry if the title's a big unclear. I don't quite know how to word this.

I'm wondering if there's any way I can do the following:

#pragma omp parallel
{
    for (int i = 0; i < iterations; i++) {
        #pragma omp for
        for (int j = 0; j < N; j++)
            // Do something
    }
}

Ignoring things such as omitting private specifiers in the for loop, is there any way that I can fork threads outside of my outer loop so that I can just parallelize the inner loop? From my understanding (please do correct me if I'm wrong), all threads will execute the outer loop. I'm unsure about the behavior of the inner loop, but I think the for will distribute chunks to each threads that encounter it.

What I want to do is not have to fork/join iterations times but just do it once in the outer loop. Is this the right strategy to do so?

What if there were another outer loop that shouldn't be parallelized? That is...

#pragma omp parallel
{

    for (int i = 0; i < iterations; i++) {
        for(int k = 0; k < innerIterations; k++) {
            #pragma omp for
            for (int j = 0; j < N; j++)
                // Do something

            // Do something else
        }
    }
}

It'd be great if someone were to point me to an example of a large application parallelized using OpenMP so that I could better understand strategies to employ when using OpenMP. I can't seem to find any.

Clarification: I'm looking for solutions that do not change loop ordering or involve blocking, caching, and general performance considerations. I want to understand how this could be done in OpenMP on the loop structure as specified. The // Do something may or may not have dependencies, assume that they do and that you can't move things around.

like image 815
Pochi Avatar asked May 08 '13 06:05

Pochi


1 Answers

The way you handled the two for loops looks right to me, in the sense that it achieves the behavior you wanted: the outer loop is not parallelized, while the inner loop is.

To better clarify what happens, I'll try to add some notes to your code:

#pragma omp parallel
{
  // Here you have a certain number of threads, let's say M
  for (int i = 0; i < iterations; i++) {
        // Each thread enters this region and executes all the iterations 
        // from i = 0 to i < iterations. Note that i is a private variable.
        #pragma omp for
        for (int j = 0; j < N; j++) {
            // What happens here is shared among threads so,
            // according to the scheduling you choose, each thread
            // will execute a particular portion of your N iterations
        } // IMPLICIT BARRIER             
  }
}

The implicit barrier is a point of synchronization where threads wait for each other. As a general rule of the thumb it is thus preferable to parallelize outer loops rather than inner loops, as this will create a single point of synchronization for the iterations*N iterations (instead of the iterations points you are creating above).

like image 112
Massimiliano Avatar answered Oct 23 '22 18:10

Massimiliano