Here are four approaches to make Sequential Consistency in x86/x86_64:
As it is written here: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
C/C++11 Operation x86 implementation
- Load Seq_Cst: MOV (from memory)
- Store Seq Cst: (LOCK) XCHG // alternative: MOV (into memory),MFENCE
Note: there is an alternative mapping of C/C++11 to x86, which instead of locking (or fencing) the Seq Cst store locks/fences the Seq Cst load:
- Load Seq_Cst: LOCK XADD(0) // alternative: MFENCE,MOV (from memory)
- Store Seq Cst: MOV (into memory)
GCC 4.8.2(GDB in x86_64) uses first(1) approach for C++11-std::memory_order_seq_cst, i.e. LOAD(without fence) and STORE+MFENCE:
std::atomic<int> a;
int temp = 0;
a.store(temp, std::memory_order_seq_cst);
0x4613e8 <+0x0058> mov 0x38(%rsp),%eax
0x4613ec <+0x005c> mov %eax,0x20(%rsp)
0x4613f0 <+0x0060> mfence
As we know, that MFENCE = LFENCE+SFENCE. Then this code we can rewrite to this: LOAD(without fence) and STORE+LFENCE+SFENCE
Questions:
The only reordering x86 does (for normal memory accesses) is that it can potentially reorder a load that follows a store.
SFENCE guarantees that all stores before the fence complete before all stores after the fence. LFENCE guarantees that all loads before the fence complete before all loads after the fence. For normal memory accesses, the ordering guarantees of individual SFENCE or LFENCE operations are already provided by default. Basically, LFENCE and SFENCE by themselves are only useful for the weaker memory access modes of x86.
Neither LFENCE, SFENCE, nor LFENCE + SFENCE prevents a store followed by a load from being reordered. MFENCE does.
The relevant reference is the Intel x86 architectural manual.
Consider the following code:
#include <atomic>
#include <cstring>
std::atomic<int> a;
char b[64];
void seq() {
/*
movl $0, a(%rip)
mfence
*/
int temp = 0;
a.store(temp, std::memory_order_seq_cst);
}
void rel() {
/*
movl $0, a(%rip)
*/
int temp = 0;
a.store(temp, std::memory_order_relaxed);
}
With respect to the atomic variable "a", seq() and rel() are both ordered and atomic on the x86 architecture because:
No fence is required to store a constant value into an atomic variable. The fences are there because std::memory_order_seq_cst implies that all memory is synchronized, not only the memory that holds the atomic variable.
The effect can be demonstrated by the following set and get functions:
void set(const char *s) {
strcpy(b, s);
int temp = 0;
a.store(temp, std::memory_order_seq_cst);
}
const char *get() {
int temp = 0;
a.store(temp, std::memory_order_seq_cst);
return b;
}
strcpy is a library function that might use newer sse instructions if such are available in runtime. Since sse instructions were not available in old processors there is no requirement on backwards compatibility and memory order is undefined. Thus the result of a strcpy in one thread might not be directly visible in other threads.
The set and get functions above uses an atomic value to enforce memory synchronization so that the result of strcpy becomes visible in other threads. Now the fences matters, but the order of them inside the call to atomic::store is not significant since the fences are not needed internally in atomic::store.
SFENCE + LFENCE is not a StoreLoad barrier (MFENCE), so the premise of the question is incorrect. (See also my answer on another version of this same question from the same user Why is (or isn't?) SFENCE + LFENCE equivalent to MFENCE?)
LFENCE+SFENCE doesn't include anything that stops a store from being buffered until after a later load. MFENCE does prevent this.
Preshing's blog post explains in more detail and with diagrams how StoreLoad barriers are special, and has a practical example of working code that demonstrates reordering without MFENCE. Anyone that's confused about memory ordering should start with that blog.
x86 has a strong memory model where every normal store has release semantics, and every normal load has acquire semantics. This post has the details.
LFENCE and SFENCE only exist for use with movnt
loads/stores, which are weakly ordered as well as bypassing the cache.
In case those links ever die, there's even more info in my answer on another similar question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With