Node *head = &node1;
while (head)
{
#pragma omp task
cout<<head->value<<endl;
head = head->next;
}
#pragma omp parallel
{
#pragma omp single
{
Node *head = &node1;
while (head)
{
#pragma omp task
cout<<head->value<<endl;
head = head->next;
}
}
}
In the first block, I just created tasks without parallel directive, while in the second block, I used parallel directive and single directive which is a common way I saw in the papers. I wonder what's the difference between them? BTW, I know the basic meaning of these directives.
The code in my comment:
void traverse(node *root)
{
if (root->left)
{
#pragma omp task
traverse(root->left);
}
if (root->right)
{
#pragma omp task
traverse(root->right);
}
process(root);
}
The difference is that in the first block you are not really creating any tasks since the block itself is not nested (neither syntactically nor lexically) inside an active parallel region. In the second block the task
construct is syntactically nested inside the parallel
region and would queue explicit tasks if the region happens to be active at run-time (an active parallel region is one that executes with a team of more than one thread). Lexical nesting is less obvious. Observe the following example:
void foo(void)
{
int i;
for (i = 0; i < 10; i++)
#pragma omp task
bar();
}
int main(void)
{
foo();
#pragma omp parallel num_threads(4)
{
#pragma omp single
foo();
}
return 0;
}
The first call to foo()
happens outside of any parallel regions. Hence the task
directive does (almost) nothing and all calls to bar()
happen serially. The second call to foo()
comes from inside the parallel region and hence new tasks would be generated inside foo()
. The parallel
region is active since the number of threads was fixed to 4
by the num_threads(4)
clause.
This different behaviour of the OpenMP directives is a design feature. The main idea is to be able to write code that could execute both as serial and as parallel.
Still the presence of the task
construct in foo()
does some code transformation, e.g. foo()
is transformed to something like:
void foo_omp_fn_1(void *omp_data)
{
bar();
}
void foo(void)
{
int i;
for (i = 0; i < 10; i++)
OMP_make_task(foo_omp_fn_1, NULL);
}
Here OMP_make_task()
is a hypothetical (not publicly available) function from the OpenMP support library that queues a call to the function, supplied as its first argument. If OMP_make_task()
detects, that it works outside an active parallel region, it would simply call foo_omp_fn_1()
instead. This adds some overhead to the call to bar()
in the serial case. Instead of main -> foo -> bar
, the call goes like main -> foo -> OMP_make_task -> foo_omp_fn_1 -> bar
. The implication of this is slower serial code execution.
This is even more obviously illustrated with the worksharing directive:
void foo(void)
{
int i;
#pragma omp for
for (i = 0; i < 12; i++)
bar();
}
int main(void)
{
foo();
#pragma omp parallel num_threads(4)
{
foo();
}
return 0;
}
The first call to foo()
would run the loop in serial. The second call would distribute the 12 iterations among the 4 threads, i.e. each thread would only execute 3 iteratons. Once again, some code transformation magic is used to achieve this and the serial loop would run slower than if no #pragma omp for
was present in foo()
.
The lesson here is to never add OpenMP constructs where they are not really necessary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With