how to avoid overhead of openMP in nested loops

Tags:

I have two versions of code that produce equivalent results where I am trying to parallelize only the inner loop of a nested for loop. I am not getting much speedup but I didn't expect a 1-to-1 since I'm trying only to parallelize the inner loop.

My main question is why these two versions have similar runtimes? Doesn't the second version fork threads only once and avoid the overhead of starting new threads on every iteration over i as in the first version?

The first version of code starts up threads on every iteration of the outer loop like this:

for(i=0; i<2000000; i++){
  sum = 0;
  #pragma omp parallel for private(j) reduction(+:sum)
  for(j=0; j<1000; j++){
    sum += 1;
  }
  final += sum;
}
printf("final=%d\n",final/2000000);

With this output and runtime:

OMP_NUM_THREADS=1

final=1000
real    0m5.847s
user    0m5.628s
sys     0m0.212s

OMP_NUM_THREADS=4

final=1000
real    0m4.017s
user    0m15.612s
sys     0m0.336s

The second version of code starts threads once(?) before the outer loop and parallelizes the inner loop like this:

#pragma omp parallel private(i,j)
for(i=0; i<2000000; i++){
  sum = 0;
  #pragma omp barrier
  #pragma omp for reduction(+:sum)
  for(j=0; j<1000; j++){
    sum += 1;
  }
  #pragma omp single
  final += sum;
}
printf("final=%d\n",final/2000000);

With this output and runtime:

OMP_NUM_THREADS=1

final=1000
real    0m5.476s
user    0m4.964s
sys     0m0.504s

OMP_NUM_THREADS=4

final=1000
real    0m4.347s
user    0m15.984s
sys     0m1.204s

Why isn't the second version much faster than the first? Doesn't it avoid the overhead of starting threads on every loop iteration or am I doing something wrong?

778

asked Jun 06 '16 01:06

nick

1 Answers

An OpenMP implementation may use thread pooling to eliminate the overhead of starting threads on encountering a parallel construct. A pool of OMP_NUM_THREADS threads is started for the first parallel construct, and after the construct is completed the slave threads are returned to the pool. These idle threads can be reallocated when a later parallel construct is encountered.

See for example this explanation of thread pooling in the Sun Studio OpenMP implementation.

184

answered Sep 17 '22 21:09

Josh Milthorpe

Related questions
                            
                                Type conversion warning after bitwise operations in C
                            
                                Read-only Pointer to Pointer
                            
                                Exclusive compute mode with OpenCL+NVidia
                            
                                Is there any enhanced gdb console for Eclipse?
                            
                                How to block all SIGNALS in thread WITHOUT using SIGWAIT?
                            
                                Exploiting a BufferOverflow [closed]
                            
                                How do I associate changed lines with functions in a git repository of C code?
                            
                                Why is "volatileQualifiedExpr + volatileQualifiedExpr" not necessarily UB in C but in C++?
                            
                                Emulate dynamic loader to fixup shared library offsets
                            
                                Casting double* to double(*)[N]
                            
                                Obtain public key from private SecKeyRef
                            
                                What is the best approach when working with on-disk data structures
                            
                                How to get the definition of a macro as a string literal?
                            
                                Linux kernel module: re-hijacking the iterate function of the virtual filesystem
                            
                                OSX equivalent for IP_RECVERR
                            
                                Memory allocation for C program
                            
                                How to selectively disable -Werror using #pragma with gcc
                            
                                STM32 USB OTG HOST Library hangs trying to create file with FatFs

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to avoid overhead of openMP in nested loops

Tags:

c

openmp

nick

People also ask

1 Answers

Josh Milthorpe

Recent Activity

Donate For Us