I'm wondering what is the behavior of the following code. This is a simple factorial program which is parallelized using OpenMP tasks. When I run this program I get 0 as output. I'm aware of missing shared(b) clause. Here is the way I call fact from the main.
#pragma omp parallel
{
#pragma omp single nowait
{
fact_result = fact(4);
}
}
Here is the actual factorial function.
int fact(int n) {
if(n == 0)
return 1;
int b = 0;
#pragma omp task
b = fact(n - 1);
#pragma omp taskwait
return n * b;
}
I can't understand why the result is incorrect without shared(b) and why it is correct with shared(b). I'm looking for a runtime diagram like this:

I know that threads start working on the tasks as soon as the step 1 completes, but what are they working with? What variables? Are they really doing anything? Can we really speed up a recursive computation like this? Sometimes there are plenty of leaves in a tree that we can speed up the merging operation using multiple threads but in this case, there is only 1 leaf in the end. I really appreciate if someone could explain the behavior of this code. Thanks in advance.
I can't understand why the result is incorrect
The data-sharing attribute for b in the task is firstprivate as per the standard rules 2.15.1.1
In a task generating construct, if no default clause is present, a variable for which the data-sharing attribute is not determined by the rules above is firstprivate.
So logically, your task looks like this:
#pragma omp task
{
int n_private = n;
int b_private = b;
b_private = fact(n_private - 1);
}
The result of the firstprivate b is just discarded.
Can we really speed up a recursive computation like this?
No*. You do not have a real tree - just a chain. You can only speed up recursive computation if you have more than one child at some nodes in the tree.
*: If there is significant work on each node, and some part of that work does not depend on the recursive call, you can do some pipelining.
Edit:
why this code works by adding a shared(b)
Because then the result of fact is not discarded and written to the variable b from the outer scope.
I have problems understanding the way taskwait works. Does that really wait for ALL the tasks in the queue to get done?
It waits for all (direct) child tasks to complete. Because the child tasks also wait for their child tasks, it is transitive in your case even for grandchildren.
You can think of it like that:
| time
V
fact(4)
*-----> fact(3)
twait *-----> fact(2)
| twait *-----> fact(1)
| | twait *-----> fact(0)
| | | twait return
| | | done
| | | return
| | done
| | return
| done
| return
done
return
If yes, then how does an intermediate task like b = fact(3) is done while its result is not resolved yet(because its result is also waiting at taskwait!).
It's not.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With