Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parallelize correctly a nested for loops

I'm working with OpenMP to parallelize a scalar nested for loop:

double P[N][N];
double x=0.0,y=0.0;

for (int i=0; i<N; i++)
{
    for (int j=0; j<N; j++)
    {
        P[i][j]=someLongFunction(x,y);
        y+=1;
    }
    x+=1;
}

In this loop the important thing is that matrix P must be the same in both scalar and parallel versions:

All my possible trials didn't succeed...

like image 329
linello Avatar asked Dec 01 '11 08:12

linello


People also ask

Should I parallelize the inner loop or the outer loop?

If you parallelize the inner loop, you will not receive a gain in performance because the small amount of work that the inner loop performs does not overcome the overhead for parallel processing. Therefore, parallelizing the outer loop only is the best way to maximize the benefits of concurrency on most systems.

Is it possible to parallelize a loop in Python?

From your comment it sounds like each loop of t is independent so we should be free to parallelize this. When python creates a new process you get a copy of the main process so you have available all your global data but when each process writes the data, it writes to it's own local copy.

How do I compile and run parallel-matrix-multiply code?

To compile the code, copy it and then paste it in a Visual Studio project, or paste it in a file that is named parallel-matrix-multiply.cpp and then run the following command in a Visual Studio Command Prompt window.


1 Answers

The problem here is that you have added iteration-to-iteration dependencies with:

x+=1;
y+=1;

Therefore, as the code stands right now, it is not parallelizable. Attempting to do so will result in incorrect results. (as you are probably seeing)

Fortunately, in your case, you can directly compute them without introducing this dependency:

for (int i=0; i<N; i++)
{
    for (int j=0; j<N; j++)
    {
        P[i][j]=someLongFunction((double)i, (double)N*i + j);
    }
}

Now you can try throwing an OpenMP pragma over this and see if it works:

#pragma omp parallel for
for (int i=0; i<N; i++)
{
    for (int j=0; j<N; j++)
    {
        P[i][j]=someLongFunction((double)i, (double)N*i + j);
    }
}
like image 190
Mysticial Avatar answered Sep 20 '22 16:09

Mysticial