Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursive factorial behavior with OpenMP tasks?

Tags:

c

openmp

I'm wondering what is the behavior of the following code. This is a simple factorial program which is parallelized using OpenMP tasks. When I run this program I get 0 as output. I'm aware of missing shared(b) clause. Here is the way I call fact from the main.

#pragma omp parallel
{
    #pragma omp single nowait
    {
        fact_result = fact(4);
    }
}

Here is the actual factorial function.

int fact(int n) {

    if(n == 0)
        return 1;

    int b = 0;

    #pragma omp task
    b = fact(n - 1);

    #pragma omp taskwait
    return n * b;

}

I can't understand why the result is incorrect without shared(b) and why it is correct with shared(b). I'm looking for a runtime diagram like this:

Tasks spanning in a recursive loop

I know that threads start working on the tasks as soon as the step 1 completes, but what are they working with? What variables? Are they really doing anything? Can we really speed up a recursive computation like this? Sometimes there are plenty of leaves in a tree that we can speed up the merging operation using multiple threads but in this case, there is only 1 leaf in the end. I really appreciate if someone could explain the behavior of this code. Thanks in advance.

like image 470
hexpheus Avatar asked Apr 25 '26 14:04

hexpheus


1 Answers

I can't understand why the result is incorrect

The data-sharing attribute for b in the task is firstprivate as per the standard rules 2.15.1.1

In a task generating construct, if no default clause is present, a variable for which the data-sharing attribute is not determined by the rules above is firstprivate.

So logically, your task looks like this:

#pragma omp task
{
    int n_private = n;
    int b_private = b;
    b_private = fact(n_private - 1);
}

The result of the firstprivate b is just discarded.

Can we really speed up a recursive computation like this?

No*. You do not have a real tree - just a chain. You can only speed up recursive computation if you have more than one child at some nodes in the tree.

*: If there is significant work on each node, and some part of that work does not depend on the recursive call, you can do some pipelining.

Edit:

why this code works by adding a shared(b)

Because then the result of fact is not discarded and written to the variable b from the outer scope.

I have problems understanding the way taskwait works. Does that really wait for ALL the tasks in the queue to get done?

It waits for all (direct) child tasks to complete. Because the child tasks also wait for their child tasks, it is transitive in your case even for grandchildren.

You can think of it like that:

| time
V
fact(4)
*-----> fact(3)
twait   *-----> fact(2)
 |      twait   *-----> fact(1)
 |       |      twait   *-----> fact(0)
 |       |       |      twait   return
 |       |       |      done  
 |       |       |      return
 |       |      done
 |       |      return
 |      done
 |      return
done
return

If yes, then how does an intermediate task like b = fact(3) is done while its result is not resolved yet(because its result is also waiting at taskwait!).

It's not.

like image 130
Zulan Avatar answered Apr 27 '26 23:04

Zulan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!