I'm working with OpenMP to parallelize a scalar nested for loop:
double P[N][N];
double x=0.0,y=0.0;
for (int i=0; i<N; i++)
{
for (int j=0; j<N; j++)
{
P[i][j]=someLongFunction(x,y);
y+=1;
}
x+=1;
}
In this loop the important thing is that matrix P must be the same in both scalar and parallel versions:
All my possible trials didn't succeed...
If you parallelize the inner loop, you will not receive a gain in performance because the small amount of work that the inner loop performs does not overcome the overhead for parallel processing. Therefore, parallelizing the outer loop only is the best way to maximize the benefits of concurrency on most systems.
From your comment it sounds like each loop of t is independent so we should be free to parallelize this. When python creates a new process you get a copy of the main process so you have available all your global data but when each process writes the data, it writes to it's own local copy.
To compile the code, copy it and then paste it in a Visual Studio project, or paste it in a file that is named parallel-matrix-multiply.cpp and then run the following command in a Visual Studio Command Prompt window.
The problem here is that you have added iteration-to-iteration dependencies with:
x+=1;
y+=1;
Therefore, as the code stands right now, it is not parallelizable. Attempting to do so will result in incorrect results. (as you are probably seeing)
Fortunately, in your case, you can directly compute them without introducing this dependency:
for (int i=0; i<N; i++)
{
for (int j=0; j<N; j++)
{
P[i][j]=someLongFunction((double)i, (double)N*i + j);
}
}
Now you can try throwing an OpenMP pragma over this and see if it works:
#pragma omp parallel for
for (int i=0; i<N; i++)
{
for (int j=0; j<N; j++)
{
P[i][j]=someLongFunction((double)i, (double)N*i + j);
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With