Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to break out of a nested parallel (OpenMP) Fortran loop idiomatically?

Here's sequential code:

do i = 1, n
   do j = i+1, n
      if ("some_condition(i,j)") then
         result = "here's result"
         return
      end if
   end do
end do

Is there a cleaner way to execute iterations of the outer loop concurrently other than:

  !$OMP PARALLEL private(i,j)
  !$OMP DO 
  do i = 1, n     
     !$OMP FLUSH(found)
     if (found) goto 10
     do j = i+1, n        
        if ("some_condition(i,j)") then
           !$OMP CRITICAL
           !$OMP FLUSH(found)
           if (.not.found) then           
              found = .true.
              result = "here's result"
           end if
           !$OMP FLUSH(found)
           !$OMP END CRITICAL
           goto 10
        end if
     end do
10   continue
  end do
  !$OMP END DO NOWAIT
  !$OMP END PARALLEL

The order of iterations over i-loop may be arbitrary as long as some result is found (it doesn't matter if it changes from run to run as long as it satisfies "some_condition").

like image 507
jfs Avatar asked Jun 05 '10 09:06

jfs


People also ask

How do I break a loop in OpenMP?

OpenMP does not provide a mechanism to break out of a parallel loop. However, you can use a Boolean value, or flag, to enable an iteration of the loop to indicate that the solution has been found. The Concurrency Runtime provides functionality that enables one task to cancel other tasks that have not yet started.

Is nested parallelism possible in OpenMP?

OpenMP parallel regions can be nested inside each other. If nested parallelism is disabled, then the new team created by a thread encountering a parallel construct inside a parallel region consists only of the encountering thread. If nested parallelism is enabled, then the new team may consist of more than one thread.

How does OpenMP parallel for work?

OpenMP in a nutshellParallel code with OpenMP marks, through a special directive, sections to be executed in parallel. The part of the code that’s marked to run in parallel will cause threads to form. The main tread is the master thread. The slave threads all run in parallel and run the same code.


1 Answers

It seems that your sequential code has a dependency that makes it unsuitable to being made parallel. Suppose that there are multiple values of i & j that make "some condition" true -- then the order of execution of the i & j do loops determines which of these conditions is found first and sets the value of result, after which the return statement ends the search for additional cases i,j that "some condition" is true. In the sequential code, the do loops always execute in the same order, so the operation of the program is deterministic and identical values of i & j that make "some condition" true will always be found. In a concurrent version, various loops i execute in non-deterministic order, so that from run to run different values of i might be the first i-value that finds a true "some condition".

Perhaps you as a programmer know that there is only one value of i & j that results in a true "some condition"? In that case short-circuiting the execution would seem OK. But the OpenMP spec says that "No statement in the associated loops other than the DO statements can cause a branch out of the loops" so having the something in the inner loop abort the output loop isn't allowed. If it is the case that there is always only one true "some condition", you could just remove the "return" and waste CPU time by having threads look for "some condition" is true after the one case has been found. That might still be faster than a sequential program. With a scaler "result" variable, it still probably non-compliant, having an dependency on the order of execution. You could change it in to a "reduction", summing the result, or return result as 1-D array of dimension (n). If you need to find the smallest value of i that has "some condition" true, you could obtain that from an array result using the Fortran instrinsic function minloc.

A solution with many "flush" and "critical" directives may not be faster than the sequential version.

UPDATE: Based on the clarification that multiple results are possible and that any will do, one parallel method would be to return mutiple results and let sequential code pick one out -- make "result" into a 1D array rather than a scaler. You are allowed to short-circuit the inner j-loop because it is not "associated" with the "omp do" directive, so "result" need only be 1D, dimensioned according to the range of i. So something like this:

program test1

integer :: i, j
integer, parameter :: n = 10
integer, dimension (n) :: result

result = -999

!omp parallel default (shared) private (i, j)
!omp do
do i = 1, n
   inner: do j = i+1, n
      if ( mod (i+j,14) == 0 ) then
         result (i) = i
         exit inner
      end if
   end do inner
end do
!omp end do
!omp end parallel

write (*, *) 'All results'
write (*, *) result

write (*, *)
write (*, *) 'One result'
write (*, *) result ( maxloc (result, 1) )

end program test1
like image 196
M. S. B. Avatar answered Sep 20 '22 07:09

M. S. B.