Suppose we have one simple variable(std::atomic<int> var
) and 2 threads T1
and T2
and we have the following code for T1
:
...
var.store(2, mem_order);
...
and for T2
...
var.load(mem_order)
...
Also let's assume that T2
(load) executes 123ns later in time(later in the modification order in terms of the C++ standard) than T1
(store).
My understanding of this situation is as follows(for different memory orders):
memory_order_seq_cst
- T2
load is obliged to load 2
. So effectively it has to load the latest value(just as it is the case with the RMW operations)memory_order_acquire
/memory_order_release
/memory_order_relaxed
- T2
is not obliged to load 2
but can load any older value with the only restriction: that value should not be older than the latest loaded by that thread. So, for example var.load
returns 0
.Am I right with my understanding?
UPDATE1:
If I'm wrong with the reasoning, please provide the text from the C++ standard which proofs it. Not just theoretical reasoning of how some architecture might work.
In C++11, it’s possible to achieve acquire and release semantics on Readywithout issuing explicit fence instructions. You just need to specify memory ordering constraints directly on the operations on Ready: Think of it as rolling each fence instruction into the operations on Readythemselves.
What’s cool is that neither acquire nor release semantics requires the use of a #StoreLoadbarrier, which is often a more expensive memory barrier type.
One case in which the #LoadStorepart becomes essential is when using acquire and release semantics to implement a (mutex) lock. In fact, this is where the names come from: acquiring a lock implies acquire semantics, while releasing a lock implies release semantics!
Acquire and release fences, as you might imagine, are standalone memory fences, which means that they aren’t coupled with any particular memory operation. So, how do they work? An acquire fenceprevents the memory reordering of any readwhich precedes it in program order with any read or writewhich follows it in program order.
Am I right with my understanding?
No. You misunderstand memory orders.
let's assume that
T2
(load) executes 123ns later thanT1
(store)...
In that case, T2 will see what T1 does with any type of memory orders(moreover, this property is applied to read/write of any memory region, see e.g. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4431.pdf, 1.10, p.15). The key word in your phrase is later: it means that someone else forces ordering of these operations.
Memory orders are used for other scenario:
Lets some operation OP1
comes in thread T1
before store operation, OP2
comes after it, OP3
comes in thread T2
before load operation, OP4
comes after it.
//T1: //T2:
OP1 OP3
var.store(2, mem_order) var.load(mem_order)
OP2 OP4
Assume, that some order between var.store()
and var.load()
can be observed by the threads. What one can garantee about cross threads order of other operations?
var.store
uses memory_order_release
, var.load
uses memory_order_acquire
and var.store
is ordered before var.load
(that is, load returns 2), then effect of OP1
is ordered before OP4
.E.g., if OP1
writes some variable var1, OP4
reads that variable, then one can be assured that OP4
will read what OP1
write before. This is the most utilized case.
var.store
and var.load
uses memory_order_seq_cst
and var.store
is ordered after var.load
(that is, load returns 0, which was value of variable before store), then effect of OP2
is ordered after OP3
.This memory order is required by some tricky syncronization schemes.
var.store
or var.load
uses memory_order_relaxed
, then with any order of var.store
and var.load
one can garantee no order of cross threads operations.This memory order is used in case, when someone else ensure order of operations. E.g., if thread T2
creation comes after var.store
in T1
, then OP3
and OP4
are ordered after OP1
.
UPDATE: 123 ns later
implies *someone else* force ordering
because computer's processor has no notion about universal time, and no operation has precise moment when it is executed. For measure time between two operations you should:
Transitively, these steps make ordering between the first operation and the second one.
Having found no arguments to prove my understanding wrong I deem it correct and my proof is as follows:
memory_order_seq_cst - T2 load is obliged to load 2.
That's correct because all operations using memory_order_seq_cst
should form the single total order on the atomic variable of all the memory operations.
Excerpt from the standard:
[29.9/3] There shall be a single total order S on all memory_order_seq_cst operations, consistent with the “happens before” order and modification orders for all affected locations, such that each memory_order_seq_cst operation B that loads a value from an atomic object M observes one of the following values <...>
The next point of my question:
memory_order_acquire/memory_order_release/memory_order_relaxed - T2 is not obliged to load 2 but can load any older value <...>
I didn't find any evidences which might indicate that the load executed later in the modification order should see the latest value. The only points I found for the store/load operations with any memory order different from the memory_order_seq_cst
are these:
[29.3/12] Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.
and
[1.10/28] An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time.
So the only guarantee we have is that the variable written will be visible within some time - that's pretty reasonable guarantee but it doesn't imply immediate visibility of the previous store. And it proofs my second point.
Given all that my initial understanding was correct.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With