Does atomic_thread_fence(memory_order_seq_cst) have the semantics of a full memory barrier?

Tags:

A full/general memory barrier is one where all the LOAD and STORE operations specified before the barrier will appear to happen before all the LOAD and STORE operations specified after the barrier with respect to the other components of the system.

According to cppreference, memory_order_seq_cst is equal to memory_order_acq_rel plus a single total modification order on all operations so tagged. But as far as I know, neither acquire nor release fence in C++11 enforces a #StoreLoad (load after store) ordering. A release fence requires that no previous read/write can be reordered with any following write; An acquire fence requires that no following read/write can be reordered with any previous read. Please correct me if I am wrong;)

Giving an example,

atomic<int> x;
atomic<int> y;

y.store(1, memory_order_relaxed);            //(1)
atomic_thread_fence(memory_order_seq_cst);   //(2)
x.load(memory_order_relaxed);                //(3)

Is it allowed by a optimizing compiler to reorder instruction (3) to before (1) so that it effective looks like:

x.load(memory_order_relaxed);                //(3)
y.store(1, memory_order_relaxed);            //(1)
atomic_thread_fence(memory_order_seq_cst);   //(2)

If this is a valid tranformation, then it proves that atomic_thread_fence(memory_order_seq_cst) doesn't not necessarily encompass the semantics of what a full barrier has.

707

asked Aug 25 '14 01:08

Eric Z

1 Answers

atomic_thread_fence(memory_order_seq_cst) always generates a full-barrier.

x86_64: MFENCE
PowerPC: hwsync
Itanuim: mf
ARMv7 / ARMv8: dmb ish
MIPS64: sync

The main thing: observing thread can simply observe in a different order, and will not matter what fences you are using in the observed thread.

Is it allowed by a optimizing compiler to reorder instruction (3) to before (1)?

Not, it isn't allowed. But in globally visible for multithreading programm this is true, only if:

other threads use the same memory_order_seq_cst for atomically read/write-operations with these values
or if other threads use the same atomic_thread_fence(memory_order_seq_cst); between load() and store() too - but this approach doesn't guarantee sequential consistency in general, because sequential consistency is more strong guarantee

Working Draft, Standard for Programming Language C++ 2016-07-12: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4606.pdf

§ 29.3 Order and consistency

§ 29.3 / 8

[ Note: memory_order_seq_cst ensures sequential consistency only for a program that is free of data races and uses exclusively memory_order_seq_cst operations. Any use of weaker ordering will invalidate this guarantee unless extreme care is used. In particular, memory_order_seq_cst fences ensure a total order only for the fences themselves. Fences cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering specifications. — end note ]

How it can be mapped to assembler:

Case-1:

atomic<int> x, y

y.store(1, memory_order_relaxed);            //(1)
atomic_thread_fence(memory_order_seq_cst);   //(2)
x.load(memory_order_relaxed);                //(3)

This code isn't always equivalent to the meaning of Case-2, but this code produce the same instructions between STORE & LOAD, as well as if both LOAD and STORE uses memory_order_seq_cst - this is Sequential Consistency which prevents StoreLoad-reordering, Case-2:

atomic<int> x, y;

y.store(1, memory_order_seq_cst);            //(1)

x.load(memory_order_seq_cst);                //(3)

With some notes:

it may add duplicate instructions (as in the following example for MIPS64)
or may use similar operations in the form of other instructions:
- as in alternative-3/4 mappings for x86_64, LOCK-prefix flushes Store-Buffer exactly as MFENCE to prevent StoreLoad-reordering
- or ARMv8 - we known, that DMB ISH are full-barrier which prevents StoreLoad-reordering: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/CHDGACJD.html

Guide for ARMv8-A

Table 13.1. Barrier parameters

ISH Any - Any

Any - Any This means that both loads and stores must complete before the barrier. Both loads and stores that appear after the barrier in program order must wait for the barrier to complete.

Prevent reordering of two instructions can be done by additional instructions between these two. And as we see the first STORE(seq_cst) and next LOAD(seq_cst) generate instructions between its are the same as FENCE(seq_cst) (atomic_thread_fence(memory_order_seq_cst))

Mapping of C/C++11 memory_order_seq_cst to differenct CPU architectures for: load(), store(), atomic_thread_fence():

Note atomic_thread_fence(memory_order_seq_cst); always generates Full-barrier:

x86_64: STORE-MOV (into memory),MFENCE, LOAD-MOV (from memory), fence-MFENCE
x86_64-alt: STORE-MOV (into memory), LOAD-MFENCE,MOV (from memory), fence-MFENCE
x86_64-alt3: STORE-(LOCK) XCHG, LOAD-MOV (from memory), fence-MFENCE - full barrier
x86_64-alt4: STORE-MOV (into memory), LOAD-LOCK XADD(0), fence-MFENCE - full barrier
PowerPC: STORE-hwsync; st, LOAD-hwsync;ld; cmp; bc; isync, fence-hwsync
Itanium: STORE-st.rel;mf, LOAD-ld.acq, fence-mf
ARMv7: STORE-dmb ish; str;dmb ish, LOAD-ldr; dmb ish, fence-dmb ish
ARMv7-alt: STORE-dmb ish; str, LOAD-dmb ish;ldr; dmb ish, fence-dmb ish
ARMv8(AArch32): STORE-STL, LOAD-LDA, fence-DMB ISH - full barrier
ARMv8(AArch64): STORE-STLR, LOAD-LDAR, fence-DMB ISH - full barrier
MIPS64: STORE-sync; sw;sync;, LOAD-sync; lw; sync;, fence-sync

There are described all mapping of C/C++11 semantics to differenct CPU architectures for: load(), store(), atomic_thread_fence(): http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

Because Sequential-Consistency prevents StoreLoad-reordering, and because Sequential-Consistency (store(memory_order_seq_cst) and next load(memory_order_seq_cst)) generates instructions between its are the same as atomic_thread_fence(memory_order_seq_cst), then atomic_thread_fence(memory_order_seq_cst) prevents StoreLoad-reordering.

answered Oct 16 '22 06:10

Alex

Related questions
                            
                                Difference between passing an array by value and reference in C++
                            
                                Conversion from Derived** to Base*const*
                            
                                Check if my QMainWindow is currently visible in Qt
                            
                                RVO force compilation error on failure
                            
                                How to move file pointer back by one integer?
                            
                                Can I specialize std::begin and std::end for the return value of equal_range()?
                            
                                "...redeclared as different kind of symbol"?
                            
                                How to configure and setup google test framework in linux
                            
                                OpenCV templates in 2D point data set
                            
                                Functors: templated struct vs templated operator()
                            
                                armadillo c++: Efficient and concise way to multiply every row of a matrix by a vector?
                            
                                100% CPU utilization when using vsync (OpenGL)
                            
                                Is there a way to auto-promote `vector<int>` to `vector<double>` during function invocation using C++11?
                            
                                Can I get a log of optimizations applied by the compiler? [duplicate]
                            
                                Is a friend function template defined in the class available for lookup? clang++ and g++ disagree
                            
                                Segmentation fault in std::thread::id's std::operator==
                            
                                A standard way to avoid virtual functions
                            
                                What is definition of reference type?
                            
                                Different ways of calling an initializer-list-constructor
                            
                                How to read numeric data as uint8_t [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Does atomic_thread_fence(memory_order_seq_cst) have the semantics of a full memory barrier?

Tags:

c++

memory-model

stdatomic

memory-barriers

relaxed-atomics

Eric Z

People also ask

1 Answers

Alex

Recent Activity

Donate For Us