OpenMP task with and without parallel

Question

Node *head = &node1;
while (head)
{
    #pragma omp task 
        cout<<head->value<<endl;
    head = head->next;
}

#pragma omp parallel
{
    #pragma omp single
    {
        Node *head = &node1;
        while (head)
        {
            #pragma omp task 
                cout<<head->value<<endl;
            head = head->next;
        }
    }
}

In the first block, I just created tasks without parallel directive, while in the second block, I used parallel directive and single directive which is a common way I saw in the papers. I wonder what's the difference between them? BTW, I know the basic meaning of these directives.

The code in my comment:

void traverse(node *root)
{
    if (root->left) 
    {
        #pragma omp task 
        traverse(root->left);
    }
    if (root->right) 
    {
        #pragma omp task 
        traverse(root->right);
    }
    process(root);
}

Hristo Iliev · Accepted Answer

The difference is that in the first block you are not really creating any tasks since the block itself is not nested (neither syntactically nor lexically) inside an active parallel region. In the second block the task construct is syntactically nested inside the parallel region and would queue explicit tasks if the region happens to be active at run-time (an active parallel region is one that executes with a team of more than one thread). Lexical nesting is less obvious. Observe the following example:

void foo(void)
{
   int i;

   for (i = 0; i < 10; i++)
      #pragma omp task
      bar();
}

int main(void)
{
   foo();

   #pragma omp parallel num_threads(4)
   {
      #pragma omp single
      foo();
   }

   return 0;
}

The first call to foo() happens outside of any parallel regions. Hence the task directive does (almost) nothing and all calls to bar() happen serially. The second call to foo() comes from inside the parallel region and hence new tasks would be generated inside foo(). The parallel region is active since the number of threads was fixed to 4 by the num_threads(4) clause.

This different behaviour of the OpenMP directives is a design feature. The main idea is to be able to write code that could execute both as serial and as parallel.

Still the presence of the task construct in foo() does some code transformation, e.g. foo() is transformed to something like:

void foo_omp_fn_1(void *omp_data)
{
   bar();
}

void foo(void)
{
   int i;

   for (i = 0; i < 10; i++)
      OMP_make_task(foo_omp_fn_1, NULL);
}

Here OMP_make_task() is a hypothetical (not publicly available) function from the OpenMP support library that queues a call to the function, supplied as its first argument. If OMP_make_task() detects, that it works outside an active parallel region, it would simply call foo_omp_fn_1() instead. This adds some overhead to the call to bar() in the serial case. Instead of main -> foo -> bar, the call goes like main -> foo -> OMP_make_task -> foo_omp_fn_1 -> bar. The implication of this is slower serial code execution.

This is even more obviously illustrated with the worksharing directive:

void foo(void)
{
   int i;

   #pragma omp for
   for (i = 0; i < 12; i++)
      bar();
}

int main(void)
{
   foo();

   #pragma omp parallel num_threads(4)
   {
      foo();
   }

   return 0;
}

The first call to foo() would run the loop in serial. The second call would distribute the 12 iterations among the 4 threads, i.e. each thread would only execute 3 iteratons. Once again, some code transformation magic is used to achieve this and the serial loop would run slower than if no #pragma omp for was present in foo().

The lesson here is to never add OpenMP constructs where they are not really necessary.

OpenMP task with and without parallel

Tags:

task

openmp

Annie Kim

1 Answers

Hristo Iliev

Recent Activity

Donate For Us

OpenMP task with and without parallel

Tags:

task

openmp

Annie Kim

1 Answers

Hristo Iliev

Related questions

Recent Activity

Donate For Us