Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When should I use DO CONCURRENT and when OpenMP?

I am aware of this and this, but I ask again as the first link is pretty old now, and the second link did not seem to reach a conclusive answer. Has any consensus developed?

My problem is simple:

I have a DO loop that has elements that may be run concurrently. Which method do I use ?

Below is code to generate particles on a simple cubic lattice.

  • npart is the number of particles
  • npart_edge & npart_face are that along an edge and a face, respectively
  • space is the lattice spacing
  • Rx, Ry, Rz are position arrays
  • x, y, z are temporary variables to decide positon on lattice

Note the difference that x,y and z have to be arrays in the CONCURRENT case, but not so in the OpenMP case because they can be defined as being PRIVATE.

So do I use DO CONCURRENT (which, as I understand from the links above, uses SIMD) :

DO CONCURRENT (i = 1, npart)
    x(i) = MODULO(i-1, npart_edge)
    Rx(i) = space*x(i)
    y(i) = MODULO( ( (i-1) / npart_edge ), npart_edge)
    Ry(i) = space*y(i)
    z(i) = (i-1) / npart_face
    Rz(i) = space*z(i)
END DO

Or do I use OpenMP?

!$OMP PARALLEL DEFAULT(SHARED) PRIVATE(x,y,z)
!$OMP DO
DO i = 1, npart
    x = MODULO(i-1, npart_edge)
    Rx(i) = space*x
    y = MODULO( ( (i-1) / npart_edge ), npart_edge)
    Ry(i) = space*y
    z = (i-1) / npart_face
    Rz(i) = space*z
END DO
!$OMP END DO
!$OMP END PARALLEL

My tests:

Placing 64 particles in a box of side 10:

$ ifort -qopenmp -real-size 64 omp.f90
$ ./a.out 
CPU time =  6.870000000000001E-003
Real time =  3.600000000000000E-003

$ ifort -real-size 64 concurrent.f90 
$ ./a.out 
CPU time =  6.699999999999979E-005
Real time =  0.000000000000000E+000

Placing 100000 particles in a box of side 100:

$ ifort -qopenmp -real-size 64 omp.f90
$ ./a.out 
CPU time =  8.213300000000000E-002
Real time =  1.280000000000000E-002

$ ifort -real-size 64 concurrent.f90 
$ ./a.out 
CPU time =  2.385000000000000E-003
Real time =  2.400000000000000E-003

Using the DO CONCURRENT construct seems to be giving me at least an order of magnitude better performance. This was done on an i7-4790K. Also, the advantage of concurrency seems to decrease with increasing size.

like image 884
physkets Avatar asked Jul 24 '16 07:07

physkets


People also ask

Is OpenMP parallel or concurrent?

OpenMP will: Allow a programmer to separate a program into serial regions and parallel regions, rather than T concurrently-executing threads.

When should I use OpenMP?

OpenMP is typically used for loop-level parallelism, but it also supports function-level parallelism. This mechanism is called OpenMP sections. The structure of sections is straightforward and can be useful in many instances. Consider one of the most important algorithms in computer science, the quicksort.

What does pragma OMP for do?

#pragma omp parallel spawns a group of threads, while #pragma omp for divides loop iterations between the spawned threads. You can do both things at once with the fused #pragma omp parallel for directive.

Does OpenMP use threads or processes?

When run, an OpenMP program will use one thread (in the sequential sections), and several threads (in the parallel sections). There is one thread that runs from the beginning to the end, and it's called the master thread.


1 Answers

DO CONCURRENT does not do any parallelization per se. The compiler may decide to parallelize it using threads or use SIMD instructions or even offload to a GPU. For threads you often have to instruct it to do so. For GPU offloading you need a particular compiler with particular options. Or (often!), the compiler just treats DO CONCURENT as a regular DO and uses SIMD if it would use them for the regular DO.

OpenMP is also not just threads, the compiler can use SIMD instructions if it wants. There is also omp simd directive, but that is only a suggestion to the compiler to use SIMD, it can be ignored.

You should try, measure and see. There is no single definitive answer. Not even for a given compiler, the less for all compilers.

If you would not use OpenMP anyway, I would give DO CONCURRENT a try to see if the automatic parallelizer does a better job with this construct. Chances are good that it will help. If your code is already in OpenMP, I do not see any point introducing DO CONCURRENT.

My practice is to use OpenMP and try to make sure the compiler vectorizes (SIMD) what it can. Especially because I use OpenMP all over my program anyway. DO CONCURRENT still has to prove it is actually useful. I am not convinced, yet, but some GPU examples look promising - however, real codes are often much more complex.


Your specific examples and the performance measurement:

Too little code is given and there are subtle points in every benchmarking. I wrote some simple code around your loops and did my own tests. I was careful NOT to include the thread creation into the timed block. You should not include $omp parallel into your timing. I also took the minimum real time over multiple computations because sometimes the first take is longer (certainly with DO CONCURRENT). CPU has various throttle modes and may need some time to spin-up. I also added SCHEDULE(STATIC).

npart=10000000
ifort -O3 concurrent.f90: 6.117300000000000E-002
ifort -O3 concurrent.f90 -parallel: 5.044600000000000E-002
ifort -O3 concurrent_omp.f90: 2.419600000000000E-002

npart=10000, default 8 threads (hyper-threading)
ifort -O3 concurrent.f90: 5.430000000000000E-004
ifort -O3 concurrent.f90 -parallel: 8.899999999999999E-005
ifort -O3 concurrent_omp.f90: 1.890000000000000E-004

npart=10000, OMP_NUM_THREADS=4 (ignore hyper-threading)
ifort -O3 concurrent.f90: 5.410000000000000E-004
ifort -O3 concurrent.f90 -parallel: 9.200000000000000E-005
ifort -O3 concurrent_omp.f90: 1.070000000000000E-004

Here, DO CONCURRENT seems to be somewhat faster for the small case, but not too much if we make sure to use the right number of cores. It is clearly slower for the big case. The -parallel option is clearly necessary for the automatic parallelization.

like image 91
Vladimir F Героям слава Avatar answered Sep 21 '22 14:09

Vladimir F Героям слава