For any std::atomic<T>
where T is a primitive type:
If I use std::memory_order_acq_rel
for fetch_xxx
operations, and std::memory_order_acquire
for load
operation and std::memory_order_release
for store
operation blindly (I mean just like resetting the default memory ordering of those functions)
std::memory_order_seq_cst
(which is being used as default) for any of the declared operations?std::memory_order_seq_cst
in terms of efficiency?Acquire-release ordering guarantees that all memory operations which happen before the storing operation (in this case, y. store(true, std::memory_order_release) ) in one thread will be visible to the other thread that is doing the corresponding loading operation (likewise, y. load(std::memory_order_acquire) ).
An operation has acquire semantics if other processors will always see its effect before any subsequent operation's effect. An operation has release semantics if other processors will see every preceding operation's effect before the effect of the operation itself.
The x86 is not sequentially consistent The technical explanation is that, instead of one bus serializing all memory accesses, each core uses its own memory cache. Writes propagate from one cache to another with finite speed (measured in clock cycles).
The C++ memory model guarantees sequential consistency if you use atomic operations with the appropriate memory orderings to guarantee sequential consistency. If you just use plain non-atomic operations, or relaxed atomics, and no mutexes, then sequential consistency is not guaranteed.
The C++11 memory ordering parameters for atomic operations specify constraints on the ordering. If you do a store with std::memory_order_release
, and a load from another thread reads the value with std::memory_order_acquire
then subsequent read operations from the second thread will see any values stored to any memory location by the first thread that were prior to the store-release, or a later store to any of those memory locations.
If both the store and subsequent load are std::memory_order_seq_cst
then the relationship between these two threads is the same. You need more threads to see the difference.
e.g. std::atomic<int>
variables x
and y
, both initially 0.
Thread 1:
x.store(1,std::memory_order_release);
Thread 2:
y.store(1,std::memory_order_release);
Thread 3:
int a=x.load(std::memory_order_acquire); // x before y int b=y.load(std::memory_order_acquire);
Thread 4:
int c=y.load(std::memory_order_acquire); // y before x int d=x.load(std::memory_order_acquire);
As written, there is no relationship between the stores to x
and y
, so it is quite possible to see a==1
, b==0
in thread 3, and c==1
and d==0
in thread 4.
If all the memory orderings are changed to std::memory_order_seq_cst
then this enforces an ordering between the stores to x
and y
. Consequently, if thread 3 sees a==1
and b==0
then that means the store to x
must be before the store to y
, so if thread 4 sees c==1
, meaning the store to y
has completed, then the store to x
must also have completed, so we must have d==1
.
In practice, then using std::memory_order_seq_cst
everywhere will add additional overhead to either loads or stores or both, depending on your compiler and processor architecture. e.g. a common technique for x86 processors is to use XCHG
instructions rather than MOV
instructions for std::memory_order_seq_cst
stores, in order to provide the necessary ordering guarantees, whereas for std::memory_order_release
a plain MOV
will suffice. On systems with more relaxed memory architectures the overhead may be greater, since plain loads and stores have fewer guarantees.
Memory ordering is hard. I devoted almost an entire chapter to it in my book.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With