not sure what should be SHARED or PRIVATE in openmp loop

Tags:

I have a loop which updates a matrix A and I want to make it openmp but I'm not sure what variables should be shared and private. I would have thought just ii and jj would work but it doesn't. I think I need an !$OMP ATOMIC UPDATE somewhere too...

The loop just calculates the distance between N and N-1 particles and updates a matrix A.

            !$OMP PARALLEL DO PRIVATE(ii,jj)
            do ii=1,N-1
                    do jj=ii+1,N
                            distance_vector=X(ii,:)-X(jj,:)
                            distance2=sum(distance_vector*distance_vector)
                            distance=DSQRT(distance2)
                            coff=distance*distance*distance
                            PE=PE-M(II)*M(JJ)/distance
                            A(jj,:)=A(jj,:)+(M(ii)/coff)*(distance_vector)
                            A(ii,:)=A(ii,:)-(M(jj)/coff)*(distance_vector)
                    end do
            end do
            !$OMP END PARALLEL DO

785

asked Jan 16 '13 14:01

Griff

1 Answers

The golden rule of OpenMP is that all variables (with some exclusions), that are defined in an outer scope, are shared by default in the parallel region. Since in Fortran before 2008 there are no local scopes (i.e. there is no BLOCK ... END BLOCK in earlier versions), all variables (except threadprivate ones) are shared, which is very natural for me (unlike Ian Bush, I am not a big fan of using default(none) and then redeclaring the visibility of all 100+ local variables in various complex scientific codes).

Here is how to determine the sharing class of each variable:

N - shared, because it should be the same in all threads and they only read its value.
ii - it is the counter of loop, subject to a worksharing directive, so its sharing class is predetermined to be private. It doesn't hurt to explicitly declare it in a PRIVATE clause, but that is not really necessary.
jj - loop counter of a loop, which is not subject to a worksharing directive, hence jj should be private.
X - shared, because all threads reference and only read from it.
distance_vector - obviously should be private as each thread works on different pairs of particles.
distance, distance2, and coff - ditto.
M - should be shared for the same reasons as X.
PE - acts as an accumulator variable (I guess this is the potential energy of the system) and should be a subject of an reduction operation, i.e. should be put in a REDUCTION(+:....) clause.
A - this one is tricky. It could be either shared and updates to A(jj,:) protected with synchronising constructs, or you could use reduction (OpenMP allows reductions over array variables in Fortran unlike in C/C++). A(ii,:) is never modified by more than one thread so it does not need special treatment.

With reduction over A in place, each thread would get its private copy of A and this could be a memory hog, although I doubt you would use this direct O(N²) simulation code to compute systems with very large number of particles. There is also a certain overhead associated with the reduction implementation. In this case you simply need to add A to the list of the REDUCTION(+:...) clause.

With synchronising constructs you have two options. You could either use the ATOMIC construct or the CRITICAL construct. As ATOMIC is only applicable to scalar contexts, you would have to "unvectorise" the assignment loop and apply ATOMIC to each statement separately, e.g.:

!$OMP ATOMIC UPDATE
A(jj,1)=A(jj,1)+(M(ii)/coff)*(distance_vector(1))
!$OMP ATOMIC UPDATE
A(jj,2)=A(jj,2)+(M(ii)/coff)*(distance_vector(2))
!$OMP ATOMIC UPDATE
A(jj,3)=A(jj,3)+(M(ii)/coff)*(distance_vector(3))

You may also rewrite this as a loop - do not forget to declare the loop counter private.

With CRITICAL there is no need to unvectorise the loop:

!$OMP CRITICAL (forceloop)
A(jj,:)=A(jj,:)+(M(ii)/coff)*(distance_vector)
!$OMP END CRITICAL (forceloop)

Naming critical regions is optional and a bit unnecessary in this particular case but in general it allows to separate unrelated critical regions.

Which is faster? Unrolled with ATOMIC or CRITICAL? It depends on many things. Usually CRITICAL is way slower since it often involves function calls to the OpenMP runtime while atomic increments, at least on x86, are implemented with locked addition instructions. As they often say, YMMV.

To recapitulate, a working version of your loop should be something like:

!$OMP PARALLEL DO PRIVATE(jj,kk,distance_vector,distance2,distance,coff) &
!$OMP& REDUCTION(+:PE)
do ii=1,N-1
   do jj=ii+1,N
      distance_vector=X(ii,:)-X(jj,:)
      distance2=sum(distance_vector*distance_vector)
      distance=DSQRT(distance2)
      coff=distance*distance*distance
      PE=PE-M(II)*M(JJ)/distance
      do kk=1,3
         !$OMP ATOMIC UPDATE
         A(jj,kk)=A(jj,kk)+(M(ii)/coff)*(distance_vector(kk))
      end do
      A(ii,:)=A(ii,:)-(M(jj)/coff)*(distance_vector)
   end do
end do
!$OMP END PARALLEL DO

I've assumed that your system is 3-dimensional.

With all this said, I second Ian Bush that you need to rethink how position and acceleration matrices are laid out in memory. Proper cache usage could boost your code and would also allow for certain operations, e.g. X(:,ii)-X(:,jj) to be vectorised, i.e. implemented using vector SIMD instructions.

125

answered Sep 23 '22 01:09

Hristo Iliev

Related questions
                            
                                Writing loops for interactive IO: problems with do-notation and layout
                            
                                Trying to find Scala tutorials that focus multi-threading
                            
                                How to update C# hashtable in a loop?
                            
                                Python: check if value is in a list no matter the CaSE
                            
                                Repeating function call until we'll get non-empty Option result in Scala
                            
                                Executing a bash script upon file creation
                            
                                The type of the expression must be an array type but it resolved to ArrayList (trying to compare string in two arrays
                            
                                Bad value for range
                            
                                Scala - difference between for each loops
                            
                                How to Split Python list every Nth element
                            
                                Skip iterations in enumerated list object (python)
                            
                                How to print iterations per second?
                            
                                A simple looping command In Python
                            
                                How to keep dropping the first value, until the sum of the vector is less than 20?
                            
                                How to store local variables in jQuery click functions?
                            
                                What is a practical difference between a loop and recursion
                            
                                how to make jquery each a number
                            
                                Is the loop condition calculated each loop for "for" sentence in Java?
                            
                                Read previous line in a file python
                            
                                Why does a for-loop with pop-method (or del statement) not iterate over all list elements

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

not sure what should be SHARED or PRIVATE in openmp loop

Tags:

loops

parallel-processing

fortran

openmp

Griff

People also ask

1 Answers

Hristo Iliev

Recent Activity

Donate For Us