Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numerical differences across threads (openMP on cygwin)

I expect the following fortran code to generate the same results for all the threads. I am working on 32-bit windows 7 with an up-to-date cygwin. Gfortran version is 4.8.3

program strange
    use omp_lib
    implicit none


    real(kind=8) :: X(3)
    real(kind=8) :: R
    real(kind=8) :: R3

    !$omp parallel private(X,R,R3) default(none)

       X(1)=7.d0
       X(2)=5.3d0
       X(3)=0.d0

       R = dsqrt(X(1)**2 + X(2)**2 +X(3)**2)
       R3 = R*R*R

       write(*,*) "Thread ", omp_get_thread_num(), " results: ", R, R3


    !$omp end parallel

end program

On my machine I get

radg@pc_radg ~/morralla/terror
$ gfortran terror.f90 -fopenmp

radg@pc_radg ~/morralla/terror
$ ./a.exe
 Thread            1  results:    8.7800911157003387        676.85722410933931
 Thread            0  results:    8.7800911157003370        676.85722410933886
 Thread            2  results:    8.7800911157003387        676.85722410933931
 Thread            3  results:    8.7800911157003387        676.85722410933931

After running several times, I see that thread 0 always shows the same result, different from all the other threads. I have also observed that when changing the number of threads to be spawned (export OMP_NUM_THREADS=x), I still get the same wrong results from thread 0

When changing the optimization level, I get good results however

radg@pc_radg ~/morralla/terror
$ gfortran -O3 terror.f90 -fopenmp

radg@pc_radg ~/morralla/terror
$ ./a.exe
 Thread            0  results:    8.7800911157003387        676.85722410933931
 Thread            1  results:    8.7800911157003387        676.85722410933931
 Thread            3  results:    8.7800911157003387        676.85722410933931
 Thread            2  results:    8.7800911157003387        676.85722410933931

The same program works properly on linux 64 bit machines (both 32 bit and 64 bit binaries). An example of such output

 Thread            3  results:    8.7800911157003387        676.85722410933931
 Thread            0  results:    8.7800911157003387        676.85722410933931
 Thread            1  results:    8.7800911157003387        676.85722410933931
 Thread            2  results:    8.7800911157003387        676.85722410933931

Any idea why can this be happening in my particular environment?

like image 613
Calculon Avatar asked Nov 11 '22 04:11

Calculon


1 Answers

Have you considered, that Fortran double precision typically has only 15 guaranteed significant digits?

Thread            1  results:    8.7800911157003387        676.85722410933931
Thread            0  results:    8.7800911157003370        676.85722410933886
Digits                      :    1 23456789012345--        123 456789012345--

In general this means, that everything after the 15th digit can not be trusted because of the intricacies of floating point operations.

You might want to read up on that here.

Especially this post in the series, concerning precision, explains, why you always get the same result on thread 0 as long as you don't recompile:

... this guarantee is mostly straightforward (if you haven’t recompiled then you’ll get the same results) but nailing it down precisely is tricky.

...

So the guarantee is really that the same machine code will produce the same results, as long as you don’t do something wacky.

...

Additionally this post of the series, concerning doubles might interest you, too.

like image 179
Max Graser Avatar answered Nov 15 '22 09:11

Max Graser