Looking at the document here, the following construct is well defined:
#pragma omp parallel //Line 1
{
#pragma omp for nowait //Line 3
for (i=0; i<N; i++)
a[i] = // some expression
#pragma omp for //Line 6
for (i=0; i<N; i++)
b[i] = ...... a[i] ......
}
since
Here the nowait clause implies that threads can start on the second loop while other threads are still working on the first. Since the two loops use the same schedule here, an iteration that uses a[i] can indeed rely on it that that value has been computed.
I am having a tough time understanding why this would be. Suppose Line 3
were:
#pragma omp for
then, since there is an implicit barrier just before Line 6
, the next for loop will have values at all indices of a
fully computed. But, with the no wait
in Line 3
, how would it work?
Suppose, Line 1
triggers 4 threads, t1, t2, t3
and t4
. Suppose N
is 8 and the partition of indices in the first for loop is thus:
t1: 0, 4
t2: 1, 5
t3: 2, 6
t4: 3, 7
Suppose t1
completes indices 0
and 4
first and lands up at Line 6
What exactly happens now? How is it guaranteed that it now gets to operate on the same indices 0
and 4
, for which the a
values are correctly computed by it in the previous iteration? What if the second for
loop accesses a[i+1]
?
#pragma omp for provides a way to get rid of the implicit barrier at the end of the loop using a "nowait" keyword but i did not use it. Checking for master and worker threads and the like is more mpi or pthread style. The idea behind openmp is exactly to get rid of all this fiddeling around between master and the rest.
Yes, "There is an implicit barrier at the end of the parallel construct."
The implicit-barrier-wait-end event occurs when a task ends an interval of active or waiting and resumes execution of an implicit barrier region. The implicit-barrier-end event occurs in each implicit task after the barrier synchronization on exit from an implicit barrier region.
There is an implicit barrier at the end of the single construct unless a nowait clause is specified.
The material you quote is wrong. It becomes correct if you add schedule(static)
to both loops - this guarantees the same distribution of indices among threads for successive loops. The default schedule is implementation defined, you cannot assume it to be static
. To quote the standard:
Different loop regions with the same schedule and iteration count, even if they occur in the same parallel region, can distribute iterations among threads differently. The only exception is for the static schedule as specified in Table 2.5. Programs that depend on which thread executes a particular iteration under any other circumstances are non-conforming.
If the second for loop accesses a[i+1]
you must absolutely leave the barrier there.
To me the statement that there is no potential problem in the example is wrong.
Indeed, scheduling will be the same as it is not explicitly defined. It will be the default one. Furthermore, if the scheduling was of static
type, then indeed, there wouldn't be any issue since the thread that would handle any given data in array a
inside the second loop would be the same as the one which would have written it in the first loop.
But the actual problem here is that the default scheduling is not defined by the OpenMP standard. This is implementation defined... For the (many) implementations where the default scheduling is static
, there cannot be any race condition in the snippet. But if the default scheduling is dynamic
, then, as you notice, a race condition can happen and the result is undefined.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With