According to the OpenMP Specification (v4.0), the following program contains a possible data race due to unsynchronized read/write of <code>i</code>: <pre class="prettyprint"><code>int i{0}; // std::atomic<int> i{0}; void write() { // #pragma omp atomic write // seq_cst i = 1; } int read() { int j; // #pragma omp atomic read // seq_cst j = i; return j; } int main() { #pragma omp parallel { /* code that calls both write() and read() */ } } </code></pre> Possible solutions that came to my mind are shown in the code as comments: <ol> <li>to protect write and read of <code>i</code> with <code>#pragma omp atomic write/read</code>,</li> <li>to protect write and read of <code>i</code> with <code>#pragma omp atomic write/read seq_cst</code>,</li> <li>to use <code>std::atomic<int></code> instead of <code>int</code> as a type of <code>i</code>.</li> </ol> Here are the compilers-generated instructions on x86_64 (with <code>-O2</code> in all cases): <pre class="prettyprint"><code>GNU g++ 4.9.2: i = 1; j = i; original code: MOV MOV #pragma omp atomic: MOV MOV // #pragma omp atomic seq_cst: MOV MOV #pragma omp atomic seq_cst: MOV+MFENCE MOV (see UPDATE) std::atomic<int>: MOV+MFENCE MOV clang++ 3.5.0: i = 1; j = i; original code: MOV MOV #pragma omp atomic: MOV MOV #pragma omp atomic seq_cst: MOV MOV std::atomic<int>: XCHG MOV Intel icpc 16.0.1: i = 1; j = i; original code: MOV MOV #pragma omp atomic: * * #pragma omp atomic seq_cst: * * std::atomic<int>: XCHG MOV * Multiple instructions with calls to __kmpc_atomic_xxx functions. </code></pre> What I wonder is why the GNU/clang compiler does not generate any special instructions for <code>#pragma omp atomic</code> writes. I would expect similar instructions as for <code>std::atomic</code>, i.e, either <code>MOV+MFENCE</code> or <code>XCHG</code>. Any explanation? UPDATE g++ 5.3.0 produces <code>MFENCE</code> for <code>#pragma omp atomic write seq_cst</code>. That is the correct behavior, I believe. Without <code>seq_cst</code>, it produces plain <code>MOV</code>, which is sufficient for non-SC atomicity. There was a bug in my Makefile, g++ 4.9.2 produces <code>MFENCE</code> for CS atomic write as well. Sorry guys for that. Clang 3.5.0 does not implement the OpenMP SC atomics, thanks Hristo Iliev for pointing this out.

There are two possibilities. <ol> <li>The compiler is not obligated to convert C++ code containing a data race into bad machine code. Depending on the machine memory model, the instructions normally used may already be atomic and coherent. Take that same C++ code to another architecture and you may start seeing the pragmas cause differences that didn't exist on x86_64.</li> <li>In addition to potentially causing use of different instructions and/or extra memory fence instructions, the atomic pragmas (as well <code>std::atomic</code> and <code>volatile</code>) also constrain the compiler's own code reordering optimizations. They may not apply to your simply case, but you certainly could see that common-subexpression elimination, including hoisting computations outside a loop, may be affected.</li> </ol>

OpenMP atomic and non-atomic reads/writes produce the same instructions on x86_64

Tags:

c++

atomic

x86-64

memory-fences

openmp

According to the OpenMP Specification (v4.0), the following program contains a possible data race due to unsynchronized read/write of i:

int i{0}; // std::atomic<int> i{0};

void write() {
// #pragma omp atomic write // seq_cst
   i = 1;
}

int read() {
   int j;
// #pragma omp atomic read // seq_cst
   j = i; 
   return j;
}

int main() {
   #pragma omp parallel
   { /* code that calls both write() and read() */ }
}

Possible solutions that came to my mind are shown in the code as comments:

to protect write and read of i with #pragma omp atomic write/read,
to protect write and read of i with #pragma omp atomic write/read seq_cst,
to use std::atomic<int> instead of int as a type of i.

Here are the compilers-generated instructions on x86_64 (with -O2 in all cases):

GNU g++ 4.9.2:               i = 1;        j = i;
original code:               MOV           MOV
#pragma omp atomic:          MOV           MOV
// #pragma omp atomic seq_cst:  MOV           MOV
#pragma omp atomic seq_cst:  MOV+MFENCE    MOV    (see UPDATE)
std::atomic<int>:            MOV+MFENCE    MOV

clang++ 3.5.0:               i = 1;        j = i;
original code:               MOV           MOV
#pragma omp atomic:          MOV           MOV
#pragma omp atomic seq_cst:  MOV           MOV
std::atomic<int>:            XCHG          MOV

Intel icpc 16.0.1:           i = 1;        j = i;
original code:               MOV           MOV
#pragma omp atomic:          *             *
#pragma omp atomic seq_cst:  *             *
std::atomic<int>:            XCHG          MOV

* Multiple instructions with calls to __kmpc_atomic_xxx functions.

What I wonder is why the GNU/clang compiler does not generate any special instructions for #pragma omp atomic writes. I would expect similar instructions as for std::atomic, i.e, either MOV+MFENCE or XCHG. Any explanation?

UPDATE

g++ 5.3.0 produces MFENCE for #pragma omp atomic write seq_cst. That is the correct behavior, I believe. Without seq_cst, it produces plain MOV, which is sufficient for non-SC atomicity.

There was a bug in my Makefile, g++ 4.9.2 produces MFENCE for CS atomic write as well. Sorry guys for that.

Clang 3.5.0 does not implement the OpenMP SC atomics, thanks Hristo Iliev for pointing this out.

243

asked Feb 17 '16 16:02

Daniel Langr

1 Answers

There are two possibilities.

The compiler is not obligated to convert C++ code containing a data race into bad machine code. Depending on the machine memory model, the instructions normally used may already be atomic and coherent. Take that same C++ code to another architecture and you may start seeing the pragmas cause differences that didn't exist on x86_64.
In addition to potentially causing use of different instructions and/or extra memory fence instructions, the atomic pragmas (as well std::atomic and volatile) also constrain the compiler's own code reordering optimizations. They may not apply to your simply case, but you certainly could see that common-subexpression elimination, including hoisting computations outside a loop, may be affected.

132

answered Oct 03 '22 02:10

Ben Voigt

Related questions
                            
                                Multithreaded array of arrays?
                            
                                Find Number of Possible Alphabet Strings from Number Array
                            
                                How to make clang search for gcc's headers?
                            
                                get_includes doesn't find standard library headers
                            
                                Passing array by reference - subset of a larger array
                            
                                Default function parameter value visible in template but it shouldn't (gcc)
                            
                                Shorter scope resolutions for private nested classes
                            
                                Video Recording Hangs on IMFSinkWriter->Finalize();
                            
                                Definition different from declaration for constexpr static member [duplicate]
                            
                                Does C++ have ordered hash?
                            
                                Why the sizeof(D) increased by 8 bytes in VS2015 when I derived D from a virtual base?
                            
                                Initialize plain 2D array with a given function on compile time
                            
                                libvlc_media_player_set_position fails when seeking backward
                            
                                What are the Implications of using _GLIBCXX_CXX11_ABI to use pre-5.1 C++ ABI with C++ 11/14 features?
                            
                                Argument-dependent lookup and function templates [duplicate]
                            
                                How to know if an OpenCV 3.0 algorithm has an OpenCL implementation in the transparent API
                            
                                extern template & incomplete types
                            
                                Virtual inheritance and polymorphism: Is the cereal library messing with object layout?
                            
                                "warning: operation of ... may be undefined" for ternary operation -- not if/else block [duplicate]
                            
                                Does the hiredis Redis library create its own thread for async callbacks

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With