Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should load-acquire see store-release immediately?

Suppose we have one simple variable(std::atomic<int> var) and 2 threads T1 and T2 and we have the following code for T1:

...
var.store(2, mem_order);
...

and for T2

...
var.load(mem_order)
...

Also let's assume that T2(load) executes 123ns later in time(later in the modification order in terms of the C++ standard) than T1(store). My understanding of this situation is as follows(for different memory orders):

  1. memory_order_seq_cst - T2 load is obliged to load 2. So effectively it has to load the latest value(just as it is the case with the RMW operations)
  2. memory_order_acquire/memory_order_release/memory_order_relaxed - T2 is not obliged to load 2 but can load any older value with the only restriction: that value should not be older than the latest loaded by that thread. So, for example var.load returns 0.

Am I right with my understanding?

UPDATE1:

If I'm wrong with the reasoning, please provide the text from the C++ standard which proofs it. Not just theoretical reasoning of how some architecture might work.

like image 316
ixSci Avatar asked Jun 11 '15 11:06

ixSci


People also ask

How do you achieve acquire and Release semantics on ready?

In C++11, it’s possible to achieve acquire and release semantics on Readywithout issuing explicit fence instructions. You just need to specify memory ordering constraints directly on the operations on Ready: Think of it as rolling each fence instruction into the operations on Readythemselves.

Do we need a storeloadbarrier for acquire semantics?

What’s cool is that neither acquire nor release semantics requires the use of a #StoreLoadbarrier, which is often a more expensive memory barrier type.

Why do we need a loadstorepart?

One case in which the #LoadStorepart becomes essential is when using acquire and release semantics to implement a (mutex) lock. In fact, this is where the names come from: acquiring a lock implies acquire semantics, while releasing a lock implies release semantics!

How do acquire and release fences work?

Acquire and release fences, as you might imagine, are standalone memory fences, which means that they aren’t coupled with any particular memory operation. So, how do they work? An acquire fenceprevents the memory reordering of any readwhich precedes it in program order with any read or writewhich follows it in program order.


2 Answers

Am I right with my understanding?

No. You misunderstand memory orders.

let's assume that T2(load) executes 123ns later than T1(store)...

In that case, T2 will see what T1 does with any type of memory orders(moreover, this property is applied to read/write of any memory region, see e.g. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4431.pdf, 1.10, p.15). The key word in your phrase is later: it means that someone else forces ordering of these operations.

Memory orders are used for other scenario:

Lets some operation OP1 comes in thread T1 before store operation, OP2comes after it, OP3 comes in thread T2 before load operation, OP4 comes after it.

//T1:                         //T2:
OP1                           OP3
var.store(2, mem_order)       var.load(mem_order)
OP2                           OP4

Assume, that some order between var.store() and var.load() can be observed by the threads. What one can garantee about cross threads order of other operations?

  1. If var.store uses memory_order_release, var.load uses memory_order_acquire and var.store is ordered before var.load (that is, load returns 2), then effect of OP1 is ordered before OP4.

E.g., if OP1 writes some variable var1, OP4 reads that variable, then one can be assured that OP4 will read what OP1 write before. This is the most utilized case.

  1. If both var.store and var.load uses memory_order_seq_cst and var.store is ordered after var.load (that is, load returns 0, which was value of variable before store), then effect of OP2 is ordered after OP3.

This memory order is required by some tricky syncronization schemes.

  1. If either var.store or var.load uses memory_order_relaxed, then with any order of var.store and var.load one can garantee no order of cross threads operations.

This memory order is used in case, when someone else ensure order of operations. E.g., if thread T2 creation comes after var.store in T1, then OP3 and OP4 are ordered after OP1.

UPDATE: 123 ns later implies *someone else* force ordering because computer's processor has no notion about universal time, and no operation has precise moment when it is executed. For measure time between two operations you should:

  1. Observe ordering between finishing the first operation and beginning time counting operation on some cpu.
  2. Observe ordering between beginning and finishing time counting operations.
  3. Observe ordering between finishing time counting operation and start of the second operation.

Transitively, these steps make ordering between the first operation and the second one.

like image 77
Tsyvarev Avatar answered Oct 19 '22 09:10

Tsyvarev


Having found no arguments to prove my understanding wrong I deem it correct and my proof is as follows:

memory_order_seq_cst - T2 load is obliged to load 2.

That's correct because all operations using memory_order_seq_cst should form the single total order on the atomic variable of all the memory operations. Excerpt from the standard:

[29.9/3] There shall be a single total order S on all memory_order_seq_cst operations, consistent with the “happens before” order and modification orders for all affected locations, such that each memory_order_seq_cst operation B that loads a value from an atomic object M observes one of the following values <...>

The next point of my question:

memory_order_acquire/memory_order_release/memory_order_relaxed - T2 is not obliged to load 2 but can load any older value <...>

I didn't find any evidences which might indicate that the load executed later in the modification order should see the latest value. The only points I found for the store/load operations with any memory order different from the memory_order_seq_cst are these:

[29.3/12] Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.

and

[1.10/28] An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time.

So the only guarantee we have is that the variable written will be visible within some time - that's pretty reasonable guarantee but it doesn't imply immediate visibility of the previous store. And it proofs my second point.

Given all that my initial understanding was correct.

like image 33
ixSci Avatar answered Oct 19 '22 11:10

ixSci