Understanding the collapse clause in openmp

Tags:

I came across an OpenMP code that had the collapse clause, which was new to me. I'm trying to understand what it means, but I don't think I have fully grasped it's implications; One definition that I found is:

COLLAPSE: Specifies how many loops in a nested loop should be collapsed into one large iteration space and divided according to the schedule clause. The sequential execution of the iterations in all associated loops determines the order of the iterations in the collapsed iteration space.

I thought I understood what that meant, so I tried the follwoing simple program:

int i, j;
#pragma omp parallel for num_threads(2) private(j)
for (i = 0; i < 4; i++)
    for (j = 0; j <= i; j++)
        printf("%d %d %d\n", i, j, omp_get_thread_num());

Which produced

I then added the collapse(2) clause. I expected to have the same result in the first two columns but now have an equal number of 0's and 1's in the last column. But I got

So my questions are:

What is happening in my code?
Under what circumstances should I use collapse?
Can you provide an example that shows the difference between using collapse and not using it?

430

asked Feb 12 '15 16:02

iomartin

1 Answers

The problem with your code is that the iterations of the inner loop depend on the outer loop. According to the OpenMP specification under the description of the section on binding and the collapse clause:

If execution of any associated loop changes any of the values used to compute any of the iteration counts, then the behavior is unspecified.

You can use collapse when this is not the case for example with a square loop

#pragma omp parallel for private(j) collapse(2) for (i = 0; i < 4; i++)     for (j = 0; j < 100; j++)

In fact this is a good example to show when to use collapse. The outer loop only has four iterations. If you have more than four threads then some will be wasted. But when you collapse the threads will distribute among 400 iterations which is likely to be much greater than the number of threads. Another reason to use collapse is if the load is not well distributed. If you only used four iterations and the fourth iteration took most of the time the other threads wait. But if you use 400 iterations the load is likely to be better distributed.

You can fuse a loop by hand for the code above like this

#pragma omp parallel for for(int n=0; n<4*100; n++) {     int i = n/100; int j=n%100;

Here is an example showing how to fuse a triply fused loop by hand.

Finally, here is an example showing how to fuse a triangular loop which collapse is not defined for.

Here is a solution that maps a rectangular loop to the triangular loop in the OPs question. This can be used to fuse the OPs triangular loop.

//int n = 4; for(int k=0; k<n*(n+1)/2; k++) {     int i = k/(n+1), j = k%(n+1);     if(j>i) i = n - i -1, j = n - j;     printf("(%d,%d)\n", i,j); }

This works for any value of n.

The map for the OPs question goes from

(0,0), (1,0), (1,1), (2,0), (2,1), (2,2), (3,0), (3,1), (3,2), (3,3),

(0,0), (3,3), (3,2), (3,1), (3,0), (1,0), (1,1), (2,2), (2,1), (2,0),

For odd values of n the map is not exactly a rectangle but the formula still works.

For example n = 3 gets mapped from

(0,0), (1,0), (1,1), (2,0), (2,1), (2,2),

(0,0), (2,2), (2,1), (2,0), (1,0), (1,1),

Here is code to test this

#include <stdio.h> int main(void) {     int n = 4;     for(int i=0; i<n; i++) {         for(int j=0; j<=i; j++) {             printf("(%d,%d)\n", i,j);         }     }     puts("");     for(int k=0; k<n*(n+1)/2; k++) {         int i = k/(n+1), j = k%(n+1);         if(j>i) i = n - i - 1, j = n - j;         printf("(%d,%d)\n", i,j);     } }

181

answered Oct 14 '22 12:10

Z boson

Related questions
                            
                                How do I align a number like this in C?
                            
                                Except OOP, why is C++ better than C? [closed]
                            
                                What is "int i = 1;Why (i >= 60 * 60 * 1000 / 1 * 1000)" true?
                            
                                C syntax for functions returning function pointers
                            
                                Fast AVX512 modulo when same divisor
                            
                                Multi-threaded debugging tutorial for GDB and C [closed]
                            
                                Is epoll thread-safe?
                            
                                How to embed WebKit into my C/C++/Win32 application?
                            
                                How do I use a C library in a Rust library compiled to WebAssembly?
                            
                                How to play MP3 files in C?
                            
                                How to provide an implementation of memcpy
                            
                                C: unary minus operator behavior with unsigned operands
                            
                                Is there some meaningful statistical data to justify keeping signed integer arithmetic overflow undefined?
                            
                                How is sleep implemented at the OS level?
                            
                                Where can I find a complete reference of the ncurses C API? [closed]
                            
                                Python - Best/Cleanest way to define constant lists or dictionarys
                            
                                Can GCC warn me about modifying the fields of a const struct in C99?
                            
                                size_t vs int in C++ and/or C
                            
                                How can I pass a C++ lambda to a C-callback that expects a function pointer and a context?
                            
                                C extension: <? and >? operators

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Understanding the collapse clause in openmp

Tags:

c

openmp

iomartin

People also ask

1 Answers

Z boson

Recent Activity

Donate For Us