parallelize inner loop using openmp

Q: Can you parallelize a while loop?

evaluated independently by all iterations, the parallelized WHILE loop could continue to execute beyond the point where the original sequential loop would stop, i.e., it can overshoot.

Q: Is nested parallelism possible in OpenMP?

OpenMP parallel regions can be nested inside each other. If nested parallelism is disabled, then the new team created by a thread encountering a parallel construct inside a parallel region consists only of the encountering thread. If nested parallelism is enabled, then the new team may consist of more than one thread.

Q: What is multithreading approach used in OpenMP?

OpenMP is an implementation of multithreading, a method of parallelizing whereby a primary thread (a series of instructions executed consecutively) forks a specified number of sub-threads and the system divides a task among them.

Tags:

c++

loops

parallel-processing

openmp

I have three nested loops but only the innermost is parallelizable. The outer and middle loop stop conditions depend on the calculations done by the innermost loop and therefore I cannot change the order.

I have used a OPENMP pragma directive just before the innermost loop but the performance with two threads is worst than with one. I guess it is because the threads are being created every iteration of the outer loops.

Is there any way to create the threads outside the outer loops but just use it in the innermost loop?

Thanks in advance

950

asked Feb 05 '11 13:02

Hernan

2 Answers

OpenMP should be using a thread-pool, so you won't be recreating threads every time you execute your loop. Strictly speaking, however, that might depend on the OpenMP implementation you are using (I know the GNU compiler uses a pool). I suggest you look for other common problems, such as false sharing.

141

answered Sep 28 '22 05:09

ltjax

Unfortunately, current multicore computer systems are no good for such fine-grained inner-loop parallelism. It's not because of a thread creation/forking issue. As Itjax pointed out, virtually all OpenMP implementations exploit thread pools, i.e., they pre-create a number of threads, and threads are parked. So, there is actually no overhead of creating threads.

However, the problems of such parallelizing inner loops are the following two overhead:

Dispatching jobs/tasks to threads: even if we don't need to physically create threads, at least we must assign jobs (= create logical tasks) to threads which mostly requires synchronizations.
Joining threads: after all threads in a team, then these threads should be joined (unless nowait OpenMP directive used). This is typically implemented as a barrier operation, which is also very intensive synchronization.

Hence, one should minimize the actual number of thread assigning/joining. You may decrease such overhead by increasing the amount of work of the inner loop per invocation. This could be done by some code changes like loop unrolling.

answered Sep 28 '22 06:09

minjang

Related questions
                            
                                How to get all properties/variables of a class at runtime/dynamically in C++
                            
                                How does stringstream work internally?
                            
                                Finding the angles for the X, Y and Z axis in 3D - OpenGL/C++
                            
                                C++ equivalent of C# 4.0's "dynamic" keyword?
                            
                                memory management & std::allocator
                            
                                Convert 128-bit hexadecimal string to base-36 string
                            
                                Boost tuple performance
                            
                                compressed string storage
                            
                                How Do you set MOTW on an Executable
                            
                                asynchronous function call C++0x
                            
                                CreateFile fails with error ERROR_SHARING_VIOLATION
                            
                                Problems with getting a node out of JSON with jsoncpp
                            
                                Can't include STL header files with Android NDK r5
                            
                                Is this way of creating static instance thread safe?
                            
                                How to cancel asynchronous read/write without closing the socket?
                            
                                Argument type deduction, references and rvalues
                            
                                Destruction of static class members in Thread local storage
                            
                                How does one store a vector<bool> or a bitset into a file, but bit-wise?
                            
                                Stop boost::depth_first_search along a particular depth if certain criteria is met
                            
                                How to pass STL containers as arguments to BOOST_CHECK_EQUAL?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With