I'm wondering why the following code produces different results in its scalar and parallel variants:
#define N 10
double P[N][N];
// zero the matrix just to be sure...
for (int i=0; i<N; i++)
for(int j=0; j<N; j++)
P[i][j]=0.0;
double xmin=-5.0,ymin=-5.0,xmax=5.0,ymax=5.0;
double x=xmin,y=ymin;
double step= abs(xmax-xmin)/(double)(N - 1 );
for (int i=0; i<N; i++)
{
#pragma omp parallel for ordered schedule(dynamic)
for ( int j=0; j<N; j++)
{
x = i*step+xmin;
y = j*step+ymin;
P[i][j]=x+y;
}
}
This code produces not completely equal results in its two version (the scalar version has just the #pragma ... part commented out).
What I've noticed is that a very small percentual of the elements of P[i][j] in the parallel version are different from those of the scalar version, but I'm wondering why...
Putting the #pragma on the outer loop as suggested is mess...completely wrong results.
P.S. g++-4.4, intel i7, linux
Ah, now I can see the problem. Your comment on the last question didn't have enough context for me to see it. But now it's clear.
The problem is here:
x = i*step+xmin;
y = j*step+ymin;
x and y are declared outside the parallel region, so they are being shared among all the threads. (and thus a nasty race condition among all the threads...)
To fix it, make them local:
for ( int j=0; j<N; j++)
{
double x = i*step+xmin;
double y = j*step+ymin;
P[i][j]=x+y;
}
With this fix, you should be able to put the #pragma on the outer loop instead of the inner loop.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With