If I use nested parallel for loops like this:
#pragma omp parallel for schedule(dynamic,1) for (int x = 0; x < x_max; ++x) { #pragma omp parallel for schedule(dynamic,1) for (int y = 0; y < y_max; ++y) { //parallelize this code here } //IMPORTANT: no code in here }
is this equivalent to:
for (int x = 0; x < x_max; ++x) { #pragma omp parallel for schedule(dynamic,1) for (int y = 0; y < y_max; ++y) { //parallelize this code here } //IMPORTANT: no code in here }
Is the outer parallel for doing anything other than creating a new task?
#pragma omp parallel spawns a group of threads, while #pragma omp for divides loop iterations between the spawned threads. You can do both things at once with the fused #pragma omp parallel for directive.
OpenMP parallel regions can be nested inside each other. If nested parallelism is disabled, then the new team created by a thread encountering a parallel construct inside a parallel region consists only of the encountering thread. If nested parallelism is enabled, then the new team may consist of more than one thread.
Parallelizing nested loops. If we have nested for loops, it is often enough to simply parallelize the outermost loop: a(); #pragma omp parallel for for (int i = 0; i < 4; ++i) { for (int j = 0; j < 4; ++j) { c(i, j); } } z();
Purpose. The omp parallel sections directive effectively combines the omp parallel and omp sections directives. This directive lets you define a parallel region containing a single sections directive in one step.
If nested parallelism is enabled, then the new team may consist of more than one thread. The OpenMP runtime library maintains a pool of threads that can be used as slave threads in parallel regions.
There are a few important things you need to keep in mind when parallelizing for loops or any other sections of code with OpenMP. For example, take a look at variable y in the pseudo code above. Because the variable is effectively being declared inside the parallelized region, each processor will have a unique and private value for y.
The first #pragma omp parallel will create a team of parallel threads and the second will then try to create for each of the original threads another team, i.e. a team of teams. However, on almost all existing implementations the second team has just only one thread: the second parallel region is essentially not used.
Is it ok to use nested Parallel.For loops? Every now and then, I get this question: “is it ok to use nested Parallel.For loops?” The short answer is “yes.” As is often the case, the longer answer is, well, longer. Typically when folks ask this question, they’re concerned about one of two things.
If your compiler supports OpenMP 3.0, you can use the collapse
clause:
#pragma omp parallel for schedule(dynamic,1) collapse(2) for (int x = 0; x < x_max; ++x) { for (int y = 0; y < y_max; ++y) { //parallelize this code here } //IMPORTANT: no code in here }
If it doesn't (e.g. only OpenMP 2.5 is supported), there is a simple workaround:
#pragma omp parallel for schedule(dynamic,1) for (int xy = 0; xy < x_max*y_max; ++xy) { int x = xy / y_max; int y = xy % y_max; //parallelize this code here }
You can enable nested parallelism with omp_set_nested(1);
and your nested omp parallel for
code will work but that might not be the best idea.
By the way, why the dynamic scheduling? Is every loop iteration evaluated in non-constant time?
NO.
The first #pragma omp parallel
will create a team of parallel threads and the second will then try to create for each of the original threads another team, i.e. a team of teams. However, on almost all existing implementations the second team has just only one thread: the second parallel region is essentially not used. Thus, your code is more like equivalent to
#pragma omp parallel for schedule(dynamic,1) for (int x = 0; x < x_max; ++x) { // only one x per thread for (int y = 0; y < y_max; ++y) { // code here: each thread loops all y } }
If you don't want that, but only parallelise the inner loop, you can do this:
#pragma omp parallel for (int x = 0; x < x_max; ++x) { // each thread loops over all x #pragma omp for schedule(dynamic,1) for (int y = 0; y < y_max; ++y) { // code here, only one y per thread } }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With