According to the OpenMP Specification (v4.0), the following program contains a possible data race due to unsynchronized read/write of i
:
int i{0}; // std::atomic<int> i{0};
void write() {
// #pragma omp atomic write // seq_cst
i = 1;
}
int read() {
int j;
// #pragma omp atomic read // seq_cst
j = i;
return j;
}
int main() {
#pragma omp parallel
{ /* code that calls both write() and read() */ }
}
Possible solutions that came to my mind are shown in the code as comments:
i
with #pragma omp atomic write/read
,i
with #pragma omp atomic write/read seq_cst
,std::atomic<int>
instead of int
as a type of i
.Here are the compilers-generated instructions on x86_64 (with -O2
in all cases):
GNU g++ 4.9.2: i = 1; j = i;
original code: MOV MOV
#pragma omp atomic: MOV MOV
// #pragma omp atomic seq_cst: MOV MOV
#pragma omp atomic seq_cst: MOV+MFENCE MOV (see UPDATE)
std::atomic<int>: MOV+MFENCE MOV
clang++ 3.5.0: i = 1; j = i;
original code: MOV MOV
#pragma omp atomic: MOV MOV
#pragma omp atomic seq_cst: MOV MOV
std::atomic<int>: XCHG MOV
Intel icpc 16.0.1: i = 1; j = i;
original code: MOV MOV
#pragma omp atomic: * *
#pragma omp atomic seq_cst: * *
std::atomic<int>: XCHG MOV
* Multiple instructions with calls to __kmpc_atomic_xxx functions.
What I wonder is why the GNU/clang compiler does not generate any special instructions for #pragma omp atomic
writes. I would expect similar instructions as for std::atomic
, i.e, either MOV+MFENCE
or XCHG
. Any explanation?
UPDATE
g++ 5.3.0 produces MFENCE
for #pragma omp atomic write seq_cst
. That is the correct behavior, I believe. Without seq_cst
, it produces plain MOV
, which is sufficient for non-SC atomicity.
There was a bug in my Makefile, g++ 4.9.2 produces MFENCE
for CS atomic write as well. Sorry guys for that.
Clang 3.5.0 does not implement the OpenMP SC atomics, thanks Hristo Iliev for pointing this out.
critical: the enclosed code block will be executed by only one thread at a time, and not simultaneously executed by multiple threads. It is often used to protect shared data fromrace conditions. atomic: the memory update (write, or read-modify-write) in the next instruction will be performed atomically.
Loads and Stores For that to be possible, such data must exist in shared memory or cache. Thus, an atomic load loads data from shared memory to either a register or thread-specific memory, depending on the processor architecture. Atomic stores move data into shared memory atomically.
Reads and writes of int are not guaranteed atomic in standard C++, and the resulting data race causes undefined behavior.
There are two possibilities.
The compiler is not obligated to convert C++ code containing a data race into bad machine code. Depending on the machine memory model, the instructions normally used may already be atomic and coherent. Take that same C++ code to another architecture and you may start seeing the pragmas cause differences that didn't exist on x86_64.
In addition to potentially causing use of different instructions and/or extra memory fence instructions, the atomic pragmas (as well std::atomic
and volatile
) also constrain the compiler's own code reordering optimizations. They may not apply to your simply case, but you certainly could see that common-subexpression elimination, including hoisting computations outside a loop, may be affected.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With