I am confused about omp single and omp task directives. I have read several examples which use both of them. The following example shows how to use the task construct to process elements of a linked list.
1 #pragma omp parallel
2 {
3 #pragma omp single
4 {
5 for(node* p = head; p; p = p->next)
6 {
7 #pragma omp task
8 process(p);
9 }
10 }
11 }
I am failing to understand the parallelism in this example. With omp single, only one thread will execute the structured block related to the single construct, is it right? In this example, Lines 4-10 is the structured block related to the single construct and it can be executed only once, then why we can use omp task inside this structured block? How can it be worked in a parallel manner?
Adding to the other answers, let me dig a bit deeper into what happens during execution.
1 #pragma omp parallel
2 {
3 #pragma omp single
4 {
5 for(node* p = head; p; p = p->next)
6 {
7 #pragma omp task
8 process(p);
9 }
10 } // barrier of single construct
11 }
In the code, I have marked a barrier that is introduced at the end of the single construct.
What happens is this:
First, when encountering the parallel construct, the main thread spawns the parallel region and creates a bunch of worker threads. Then you have n threads running and executing the parallel region.
Second, the single construct picks any one of the n threads and executes the code inside the curly braces of the single construct. All other n-1 threads will proceed to the barrier in line 10. There, they will wait for the last thread to catch up and complete the barrier synchronization. While these threads are waiting there, they are not only wasting time but also wait for work to arrive.
Third, the thread that was picked by the single construct (the "producer") executes the for loop and for each iteration it creates a new task. This task is then put into a task pool so that another thread (one of the ones in the barrier) can pick it up and execute it. Once the producer is done creating tasks, it will join the barrier and if there are still tasks in the task pool waiting for execution, it will help the other threads execute tasks.
Fourth, once all tasks have been generated and executed that way, all threads are done and the barrier synchronization is complete.
I have simplified a bit here and there, as there are more aspects to how an OpenMP implementation can execute tasks, but from a conceptual point of view, the above is what you can think of is happening until you're ready to dive into specific aspects of task scheduling in the OpenMP API.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With