Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

parallelize inner loop using openmp

I have three nested loops but only the innermost is parallelizable. The outer and middle loop stop conditions depend on the calculations done by the innermost loop and therefore I cannot change the order.

I have used a OPENMP pragma directive just before the innermost loop but the performance with two threads is worst than with one. I guess it is because the threads are being created every iteration of the outer loops.

Is there any way to create the threads outside the outer loops but just use it in the innermost loop?

Thanks in advance

like image 950
Hernan Avatar asked Feb 05 '11 13:02

Hernan


People also ask

How do I parallelize nested loops in OpenMP?

Parallelizing nested loops. If we have nested for loops, it is often enough to simply parallelize the outermost loop: a(); #pragma omp parallel for for (int i = 0; i < 4; ++i) { for (int j = 0; j < 4; ++j) { c(i, j); } } z(); This is all that we need most of the time.

Can you parallelize a while loop?

evaluated independently by all iterations, the parallelized WHILE loop could continue to execute beyond the point where the original sequential loop would stop, i.e., it can overshoot.

Is nested parallelism possible in OpenMP?

OpenMP parallel regions can be nested inside each other. If nested parallelism is disabled, then the new team created by a thread encountering a parallel construct inside a parallel region consists only of the encountering thread. If nested parallelism is enabled, then the new team may consist of more than one thread.

What is multithreading approach used in OpenMP?

OpenMP is an implementation of multithreading, a method of parallelizing whereby a primary thread (a series of instructions executed consecutively) forks a specified number of sub-threads and the system divides a task among them.


2 Answers

OpenMP should be using a thread-pool, so you won't be recreating threads every time you execute your loop. Strictly speaking, however, that might depend on the OpenMP implementation you are using (I know the GNU compiler uses a pool). I suggest you look for other common problems, such as false sharing.

like image 141
ltjax Avatar answered Sep 28 '22 05:09

ltjax


Unfortunately, current multicore computer systems are no good for such fine-grained inner-loop parallelism. It's not because of a thread creation/forking issue. As Itjax pointed out, virtually all OpenMP implementations exploit thread pools, i.e., they pre-create a number of threads, and threads are parked. So, there is actually no overhead of creating threads.

However, the problems of such parallelizing inner loops are the following two overhead:

  • Dispatching jobs/tasks to threads: even if we don't need to physically create threads, at least we must assign jobs (= create logical tasks) to threads which mostly requires synchronizations.
  • Joining threads: after all threads in a team, then these threads should be joined (unless nowait OpenMP directive used). This is typically implemented as a barrier operation, which is also very intensive synchronization.

Hence, one should minimize the actual number of thread assigning/joining. You may decrease such overhead by increasing the amount of work of the inner loop per invocation. This could be done by some code changes like loop unrolling.

like image 23
minjang Avatar answered Sep 28 '22 06:09

minjang