Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Starting a thread for each inner loop in OpenMP

I'm fairly new to OpenMP and I'm trying to start an individual thread to process each item in a 2D array.

So essentially, this:

for (i = 0; i < dimension; i++) {
    for (int j = 0; j < dimension; j++) {
        a[i][j] = b[i][j] + c[i][j];

What I'm doing is this:

#pragma omp parallel for shared(a,b,c) private(i,j) reduction(+:diff) schedule(dynamic)
    for (i = 0; i < dimension; i++) {
        for (int j = 0; j < dimension; j++) {
            a[i][j] = b[i][j] + c[i][j];

Does this in fact start a thread for each 2D item or no? How would I test that? If it is wrong, what is the correct way to do it? Thanks!

Note: The code has been greatly simplified

like image 528
achinda99 Avatar asked Feb 07 '10 03:02

achinda99


1 Answers

Only the outer loop is parallel in your code sample. You can test by printing omp_get_thread_num() in the inner loop and you will see that, for a given i, the thread num is the same (of course, this test is demonstrative rather than definitive since different runs will give different results). For example, with:

#include <stdio.h>
#include <omp.h>
#define dimension 4

int main() {
    #pragma omp parallel for
    for (int i = 0; i < dimension; i++)
        for (int j = 0; j < dimension; j++)
            printf("i=%d, j=%d, thread = %d\n", i, j, omp_get_thread_num());
    }

I get:

i=1, j=0, thread = 1
i=3, j=0, thread = 3
i=2, j=0, thread = 2
i=0, j=0, thread = 0
i=1, j=1, thread = 1
i=3, j=1, thread = 3
i=2, j=1, thread = 2
i=0, j=1, thread = 0
i=1, j=2, thread = 1
i=3, j=2, thread = 3
i=2, j=2, thread = 2
i=0, j=2, thread = 0
i=1, j=3, thread = 1
i=3, j=3, thread = 3
i=2, j=3, thread = 2
i=0, j=3, thread = 0

As for the rest of your code, you might want to put more details in a new question (it's difficult to tell from the small sample), but for example, you can't put private(j) when j is only declared later. It is automatically private in my example above. I guess diff is a variable that we can't see in the sample. Also, the loop variable i is automatically private (from the version 2.5 spec - same in the 3.0 spec)

The loop iteration variable in the for-loop of a for or parallel for construct is private in that construct.

Edit: All of the above is correct for the code that you and I have shown, but you may be interested in the following. For OpenMP Version 3.0 (available in e.g. gcc version 4.4, but not version 4.3) there is a collapse clause where you could write the code as you have, but with #pragma omp parallel for collapse (2) to parallelize both for loops (see the spec).

Edit: OK, I downloaded gcc 4.5.0 and ran the above code, but using collapse (2) to get the following output, showing the inner loop now parallelized:

i=0, j=0, thread = 0
i=0, j=2, thread = 1
i=1, j=0, thread = 2
i=2, j=0, thread = 4
i=0, j=1, thread = 0
i=1, j=2, thread = 3
i=3, j=0, thread = 6
i=2, j=2, thread = 5
i=3, j=2, thread = 7
i=0, j=3, thread = 1
i=1, j=1, thread = 2
i=2, j=1, thread = 4
i=1, j=3, thread = 3
i=3, j=1, thread = 6
i=2, j=3, thread = 5
i=3, j=3, thread = 7

Comments here (search for "Workarounds") are also relevant for work-arounds in version 2.5 if you want to parallelize both loops, but the version 2.5 spec cited above is quite explicit (see the non-conforming examples in section A.35).

like image 175
Ramashalanka Avatar answered Oct 22 '22 13:10

Ramashalanka