Does standard C++11 guarantee that memory_order_seq_cst
prevents StoreLoad reordering around an atomic operation for non-atomic memory accesses?
As known, there are 6 std::memory_order
s in C++11, and its specifies how regular, non-atomic memory accesses are to be ordered around an atomic operation - Working Draft, Standard for Programming Language C++ 2016-07-12: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4606.pdf
§ 29.3 Order and consistency
§ 29.3 / 1
The enumeration memory_order specifies the detailed regular (non-atomic) memory synchronization order as defined in 1.10 and may provide for operation ordering. Its enumerated values and their meanings are as follows:
Also known, that these 6 memory_orders prevent some of these reordering:
But, does memory_order_seq_cst
prevent StoreLoad reordering around an atomic operation for regular, non-atomic memory accesses or only for other atomic with the same memory_order_seq_cst
?
I.e. to prevent this StoreLoad-reordering should we use std::memory_order_seq_cst
for both STORE and LOAD, or only for one of it?
std::atomic<int> a, b;
b.store(1, std::memory_order_seq_cst); // Sequential Consistency
a.load(std::memory_order_seq_cst); // Sequential Consistency
About Acquire-Release semantic is all clear, it specifies exactly non-atomic memory-access reordering across atomic operations: http://en.cppreference.com/w/cpp/atomic/memory_order
To prevent StoreLoad-reordering we should use std::memory_order_seq_cst
.
Two examples:
std::memory_order_seq_cst
for both STORE and LOAD: there is MFENCE
StoreLoad can't be reordered - GCC 6.1.0 x86_64: https://godbolt.org/g/mVZJs0
std::atomic<int> a, b;
b.store(1, std::memory_order_seq_cst); // can't be executed after LOAD
a.load(std::memory_order_seq_cst); // can't be executed before STORE
std::memory_order_seq_cst
for LOAD only: there isn't MFENCE
StoreLoad can be reordered - GCC 6.1.0 x86_64: https://godbolt.org/g/2NLy12
std::atomic<int> a, b;
b.store(1, std::memory_order_release); // can be executed after LOAD
a.load(std::memory_order_seq_cst); // can be executed before STORE
Also if C/C++-compiler used alternative mapping of C/C++11 to x86, which flushes the Store Buffer before the LOAD: MFENCE,MOV (from memory)
, so we must use std::memory_order_seq_cst
for LOAD too: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html As this example is discussed in another question as approach (3): Does it make any sense instruction LFENCE in processors x86/x86_64?
I.e. we should use std::memory_order_seq_cst
for both STORE and LOAD to generate MFENCE
guaranteed, that prevents StoreLoad reordering.
Is it true, that memory_order_seq_cst
for atomic Load or Store:
specifi Acquire-Release semantic - prevent: LoadLoad, LoadStore, StoreStore reordering around an atomic operation for regular, non-atomic memory accesses,
but prevent StoreLoad reordering around an atomic operation only for other atomic operations with the same memory_order_seq_cst
?
The default is std::memory_order_seq_cst which establishes a single total ordering over all atomic operations tagged with this tag: all threads see the same order of such atomic operations and no memory_order_seq_cst atomic operations can be reordered.
The problem is that atomic operations on their own don't prevent reordering. We need an additional concept for atomics to do this. In C11, atomic operations take in another parameter called "memory ordering" which helps solve this problem.
No, standard C++11 doesn't guarantee that memory_order_seq_cst
prevents StoreLoad reordering of non-atomic
around an atomic(seq_cst)
.
Even standard C++11 doesn't guarantee that memory_order_seq_cst
prevents StoreLoad reordering of atomic(non-seq_cst)
around an atomic(seq_cst)
.
Working Draft, Standard for Programming Language C++ 2016-07-12: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4606.pdf
memory_order_seq_cst
operations - C++11 Standard:§ 29.3
3
There shall be a single total order S on all memory_order_seq_cst operations, consistent with the “happens before” order and modification orders for all affected locations, such that each memory_order_seq_cst operation B that loads a value from an atomic object M observes one of the following values: ...
memory_order_seq_cst
hasn't sequential consistency and hasn't single total order, i.e. non-memory_order_seq_cst
operations can be reordered with memory_order_seq_cst
operations in allowed directions - C++11 Standard:§ 29.3
8 [ Note: memory_order_seq_cst ensures sequential consistency only for a program that is free of data races and uses exclusively memory_order_seq_cst operations. Any use of weaker ordering will invalidate this guarantee unless extreme care is used. In particular, memory_order_seq_cst fences ensure a total order only for the fences themselves. Fences cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering specifications. — end note ]
Also C++-compilers allows such reorderings:
Usually - if in compilers seq_cst implemented as barrier after store, then:
STORE-C(relaxed);
LOAD-B(seq_cst);
can be reordered to LOAD-B(seq_cst);
STORE-C(relaxed);
Screenshot of Asm generated by GCC 7.0 x86_64: https://godbolt.org/g/4yyeby
Also, theoretically possible - if in compilers seq_cst implemented as barrier before load, then:
STORE-A(seq_cst);
LOAD-C(acq_rel);
can be reordered to LOAD-C(acq_rel);
STORE-A(seq_cst);
STORE-A(seq_cst);
LOAD-C(relaxed);
can be reordered to LOAD-C(relaxed);
STORE-A(seq_cst);
Also on PowerPC can be such reordering:
STORE-A(seq_cst);
STORE-C(relaxed);
can reordered to STORE-C(relaxed);
STORE-A(seq_cst);
If even atomic variables are allowed to be reordered across atomic(seq_cst), then non-atomic variables can also be reordered across atomic(seq_cst).
Screenshot of Asm generated by GCC 4.8 PowerPC: https://godbolt.org/g/BTQBr8
More details:
STORE-C(release);
LOAD-B(seq_cst);
can be reordered to LOAD-B(seq_cst);
STORE-C(release);
Intel® 64 and IA-32 Architectures
8.2.3.4 Loads May Be Reordered with Earlier Stores to Different Locations
I.e. x86_64 code:
STORE-A(seq_cst);
STORE-C(release);
LOAD-B(seq_cst);
Can be reordered to:
STORE-A(seq_cst);
LOAD-B(seq_cst);
STORE-C(release);
This can happen because between c.store
and b.load
isn't mfence
:
x86_64 - GCC 7.0: https://godbolt.org/g/dRGTaO
C++ & asm - code:
#include <atomic>
// Atomic load-store
void test() {
std::atomic<int> a, b, c;
a.store(2, std::memory_order_seq_cst); // movl 2,[a]; mfence;
c.store(4, std::memory_order_release); // movl 4,[c];
int tmp = b.load(std::memory_order_seq_cst); // movl [b],[tmp];
}
It can be reordered to:
#include <atomic>
// Atomic load-store
void test() {
std::atomic<int> a, b, c;
a.store(2, std::memory_order_seq_cst); // movl 2,[a]; mfence;
int tmp = b.load(std::memory_order_seq_cst); // movl [b],[tmp];
c.store(4, std::memory_order_release); // movl 4,[c];
}
Also, Sequential Consistency in x86/x86_64 can be implemented in four ways: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
LOAD
(without fence) andSTORE
+MFENCE
LOAD
(without fence) andLOCK XCHG
MFENCE
+LOAD
andSTORE
(without fence)LOCK XADD
( 0 ) andSTORE
(without fence)
LOAD
and (STORE
+MFENCE
)/(LOCK XCHG
) - we reviewed aboveMFENCE
+LOAD
)/LOCK XADD
and STORE
- allow next reordering:STORE-A(seq_cst);
LOAD-C(acq_rel);
can be reordered to LOAD-C(acq_rel);
STORE-A(seq_cst);
STORE-A(seq_cst);
LOAD-C(relaxed);
can be reordered to LOAD-C(relaxed);
STORE-A(seq_cst);
Allows Store-Load reordering (Table 5 - PowerPC): http://www.rdrop.com/users/paulmck/scalability/paper/whymb.2010.06.07c.pdf
Stores Reordered After Loads
I.e. PowerPC code:
STORE-A(seq_cst);
STORE-C(relaxed);
LOAD-C(relaxed);
LOAD-B(seq_cst);
Can be reordered to:
LOAD-C(relaxed);
STORE-A(seq_cst);
STORE-C(relaxed);
LOAD-B(seq_cst);
PowerPC - GCC 4.8 : https://godbolt.org/g/xowFD3
C++ & asm - code:
#include <atomic>
// Atomic load-store
void test() {
std::atomic<int> a, b, c; // addr: 20, 24, 28
a.store(2, std::memory_order_seq_cst); // li r9<-2; sync; stw r9->[a];
c.store(4, std::memory_order_relaxed); // li r9<-4; stw r9->[c];
c.load(std::memory_order_relaxed); // lwz r9<-[c];
int tmp = b.load(std::memory_order_seq_cst); // sync; lwz r9<-[b]; ... isync;
}
By dividing a.store
into two parts - it can be reordered to:
#include <atomic>
// Atomic load-store
void test() {
std::atomic<int> a, b, c; // addr: 20, 24, 28
//a.store(2, std::memory_order_seq_cst); // part-1: li r9<-2; sync;
c.load(std::memory_order_relaxed); // lwz r9<-[c];
a.store(2, std::memory_order_seq_cst); // part-2: stw r9->[a];
c.store(4, std::memory_order_relaxed); // li r9<-4; stw r9->[c];
int tmp = b.load(std::memory_order_seq_cst); // sync; lwz r9<-[b]; ... isync;
}
Where load-from-memory lwz r9<-[c];
executed earlier than store-to-memory stw r9->[a];
.
Also on PowerPC can be such reordering:
STORE-A(seq_cst);
STORE-C(relaxed);
can reordered to STORE-C(relaxed);
STORE-A(seq_cst);
Because PowerPC has weak memory ordering model - allows Store-Store reordering (Table 5 - PowerPC): http://www.rdrop.com/users/paulmck/scalability/paper/whymb.2010.06.07c.pdf
Stores Reordered After Stores
I.e. on PowerPC operations Store can be reordered with other Store, then previous example can be reordered such as:
#include <atomic>
// Atomic load-store
void test() {
std::atomic<int> a, b, c; // addr: 20, 24, 28
//a.store(2, std::memory_order_seq_cst); // part-1: li r9<-2; sync;
c.load(std::memory_order_relaxed); // lwz r9<-[c];
c.store(4, std::memory_order_relaxed); // li r9<-4; stw r9->[c];
a.store(2, std::memory_order_seq_cst); // part-2: stw r9->[a];
int tmp = b.load(std::memory_order_seq_cst); // sync; lwz r9<-[b]; ... isync;
}
Where store-to-memory stw r9->[c];
executed earlier than store-to-memory stw r9->[a];
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With