Suppose we have one simple variable(<code>std::atomic<int> var</code>) and 2 threads <code>T1</code> and <code>T2</code> and we have the following code for <code>T1</code>: <pre class="prettyprint"><code>... var.store(2, mem_order); ... </code></pre> and for <code>T2</code> <pre class="prettyprint"><code>... var.load(mem_order) ... </code></pre> Also let's assume that <code>T2</code>(load) executes 123ns later in time(later in the modification order in terms of the C++ standard) than <code>T1</code>(store). My understanding of this situation is as follows(for different memory orders): <ol> <li> <code>memory_order_seq_cst</code> - <code>T2</code> load is obliged to load <code>2</code>. So effectively it has to load the latest value(just as it is the case with the RMW operations)</li> <li> <code>memory_order_acquire</code>/<code>memory_order_release</code>/<code>memory_order_relaxed</code> - <code>T2</code> is not obliged to load <code>2</code> but can load any older value with the only restriction: that value should not be older than the latest loaded by that thread. So, for example <code>var.load</code> returns <code>0</code>.</li> </ol> Am I right with my understanding? UPDATE1: If I'm wrong with the reasoning, please provide the text from the C++ standard which proofs it. Not just theoretical reasoning of how some architecture might work.

<blockquote> Am I right with my understanding? </blockquote> No. You misunderstand memory orders. <blockquote> let's assume that <code>T2</code>(load) executes 123ns later than <code>T1</code>(store)... </blockquote> In that case, T2 will see what T1 does with any type of memory orders(moreover, this property is applied to read/write of any memory region, see e.g. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4431.pdf, 1.10, p.15). The key word in your phrase is later: it means that someone else forces ordering of these operations. Memory orders are used for other scenario: Lets some operation <code>OP1</code> comes in thread <code>T1</code> before store operation, <code>OP2</code>comes after it, <code>OP3</code> comes in thread <code>T2</code> before load operation, <code>OP4</code> comes after it. <pre class="prettyprint"><code>//T1: //T2: OP1 OP3 var.store(2, mem_order) var.load(mem_order) OP2 OP4 </code></pre> Assume, that some order between <code>var.store()</code> and <code>var.load()</code> can be observed by the threads. What one can garantee about cross threads order of other operations? <ol> <li>If <code>var.store</code> uses <code>memory_order_release</code>, <code>var.load</code> uses <code>memory_order_acquire</code> and <code>var.store</code> is ordered before <code>var.load</code> (that is, load returns 2), then effect of <code>OP1</code> is ordered before <code>OP4</code>.</li> </ol> E.g., if <code>OP1</code> writes some variable var1, <code>OP4</code> reads that variable, then one can be assured that <code>OP4</code> will read what <code>OP1</code> write before. This is the most utilized case. <ol start="2"> <li>If both <code>var.store</code> and <code>var.load</code> uses <code>memory_order_seq_cst</code> and <code>var.store</code> is ordered after <code>var.load</code> (that is, load returns 0, which was value of variable before store), then effect of <code>OP2</code> is ordered after <code>OP3</code>.</li> </ol> This memory order is required by some tricky syncronization schemes. <ol start="3"> <li>If either <code>var.store</code> or <code>var.load</code> uses <code>memory_order_relaxed</code>, then with any order of <code>var.store</code> and <code>var.load</code> one can garantee no order of cross threads operations.</li> </ol> This memory order is used in case, when someone else ensure order of operations. E.g., if thread <code>T2</code> creation comes after <code>var.store</code> in <code>T1</code>, then <code>OP3</code> and <code>OP4</code> are ordered after <code>OP1</code>. UPDATE: <code>123 ns later</code> implies <code>*someone else* force ordering</code> because computer's processor has no notion about universal time, and no operation has precise moment when it is executed. For measure time between two operations you should: <ol> <li>Observe ordering between finishing the first operation and beginning time counting operation on some cpu.</li> <li>Observe ordering between beginning and finishing time counting operations.</li> <li>Observe ordering between finishing time counting operation and start of the second operation.</li> </ol> Transitively, these steps make ordering between the first operation and the second one.

Having found no arguments to prove my understanding wrong I deem it correct and my proof is as follows: <blockquote> memory_order_seq_cst - T2 load is obliged to load 2. </blockquote> That's correct because all operations using <code>memory_order_seq_cst</code> should form the single total order on the atomic variable of all the memory operations. Excerpt from the standard: <blockquote> [29.9/3] There shall be a single total order S on all memory_order_seq_cst operations, consistent with the “happens before” order and modification orders for all affected locations, such that each memory_order_seq_cst operation B that loads a value from an atomic object M observes one of the following values <...> </blockquote> The next point of my question: <blockquote> memory_order_acquire/memory_order_release/memory_order_relaxed - T2 is not obliged to load 2 but can load any older value <...> </blockquote> I didn't find any evidences which might indicate that the load executed later in the modification order should see the latest value. The only points I found for the store/load operations with any memory order different from the <code>memory_order_seq_cst</code> are these: <blockquote> [29.3/12] Implementations should make atomic stores visible to atomic loads within a reasonable amount of time. </blockquote> and <blockquote> [1.10/28] An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time. </blockquote> So the only guarantee we have is that the variable written will be visible within some time - that's pretty reasonable guarantee but it doesn't imply immediate visibility of the previous store. And it proofs my second point. Given all that my initial understanding was correct.

Should load-acquire see store-release immediately?

Tags:

c++

multithreading

atomic

Suppose we have one simple variable(std::atomic<int> var) and 2 threads T1 and T2 and we have the following code for T1:

...
var.store(2, mem_order);
...

and for T2

...
var.load(mem_order)
...

Also let's assume that T2(load) executes 123ns later in time(later in the modification order in terms of the C++ standard) than T1(store). My understanding of this situation is as follows(for different memory orders):

memory_order_seq_cst - T2 load is obliged to load 2. So effectively it has to load the latest value(just as it is the case with the RMW operations)
memory_order_acquire/memory_order_release/memory_order_relaxed - T2 is not obliged to load 2 but can load any older value with the only restriction: that value should not be older than the latest loaded by that thread. So, for example var.load returns 0.

Am I right with my understanding?

UPDATE1:

If I'm wrong with the reasoning, please provide the text from the C++ standard which proofs it. Not just theoretical reasoning of how some architecture might work.

316

asked Jun 11 '15 11:06

ixSci

2 Answers

Am I right with my understanding?

No. You misunderstand memory orders.

let's assume that T2(load) executes 123ns later than T1(store)...

In that case, T2 will see what T1 does with any type of memory orders(moreover, this property is applied to read/write of any memory region, see e.g. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4431.pdf, 1.10, p.15). The key word in your phrase is later: it means that someone else forces ordering of these operations.

Memory orders are used for other scenario:

Lets some operation OP1 comes in thread T1 before store operation, OP2comes after it, OP3 comes in thread T2 before load operation, OP4 comes after it.

//T1:                         //T2:
OP1                           OP3
var.store(2, mem_order)       var.load(mem_order)
OP2                           OP4

Assume, that some order between var.store() and var.load() can be observed by the threads. What one can garantee about cross threads order of other operations?

If var.store uses memory_order_release, var.load uses memory_order_acquire and var.store is ordered before var.load (that is, load returns 2), then effect of OP1 is ordered before OP4.

E.g., if OP1 writes some variable var1, OP4 reads that variable, then one can be assured that OP4 will read what OP1 write before. This is the most utilized case.

If both var.store and var.load uses memory_order_seq_cst and var.store is ordered after var.load (that is, load returns 0, which was value of variable before store), then effect of OP2 is ordered after OP3.

This memory order is required by some tricky syncronization schemes.

If either var.store or var.load uses memory_order_relaxed, then with any order of var.store and var.load one can garantee no order of cross threads operations.

This memory order is used in case, when someone else ensure order of operations. E.g., if thread T2 creation comes after var.store in T1, then OP3 and OP4 are ordered after OP1.

UPDATE: 123 ns later implies *someone else* force ordering because computer's processor has no notion about universal time, and no operation has precise moment when it is executed. For measure time between two operations you should:

Observe ordering between finishing the first operation and beginning time counting operation on some cpu.
Observe ordering between beginning and finishing time counting operations.
Observe ordering between finishing time counting operation and start of the second operation.

Transitively, these steps make ordering between the first operation and the second one.

answered Oct 19 '22 09:10

Tsyvarev

Having found no arguments to prove my understanding wrong I deem it correct and my proof is as follows:

memory_order_seq_cst - T2 load is obliged to load 2.

That's correct because all operations using memory_order_seq_cst should form the single total order on the atomic variable of all the memory operations. Excerpt from the standard:

[29.9/3] There shall be a single total order S on all memory_order_seq_cst operations, consistent with the “happens before” order and modification orders for all affected locations, such that each memory_order_seq_cst operation B that loads a value from an atomic object M observes one of the following values <...>

The next point of my question:

memory_order_acquire/memory_order_release/memory_order_relaxed - T2 is not obliged to load 2 but can load any older value <...>

I didn't find any evidences which might indicate that the load executed later in the modification order should see the latest value. The only points I found for the store/load operations with any memory order different from the memory_order_seq_cst are these:

[29.3/12] Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.

and

[1.10/28] An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time.

So the only guarantee we have is that the variable written will be visible within some time - that's pretty reasonable guarantee but it doesn't imply immediate visibility of the previous store. And it proofs my second point.

Given all that my initial understanding was correct.

answered Oct 19 '22 11:10

ixSci

Related questions
                            
                                Boost : persistent storage of R-trees?
                            
                                Why doesn't std::shared_ptr have operator->*?
                            
                                Reusing a pointer after `delete`
                            
                                std::vector constructor taking pair of iterators
                            
                                How do I use threading in a class?
                            
                                clang 3.6 fold expression left/right
                            
                                Recursive variadic function template
                            
                                What's the equivalent for while (cin >> var) in python?
                            
                                Unable to create a debugger engine of the type "No engine"
                            
                                Does converting a float to a double and back to float give the same value in C++
                            
                                Capturing camera image with v4l2 very slow
                            
                                Wrapping nested templated types in nim
                            
                                Handling Mac OS X file open event BEFORE C++ main() executes
                            
                                How to call other class' const member function via a std::unique_ptr member
                            
                                C++ check whether constructor contains a parameter of given type
                            
                                creating clickable "buttons" c++
                            
                                no viable conversion from 'value_type' (aka 'char') to 'string' (aka 'basic_string<char, char_traits<char>, allocator<char> >')
                            
                                cuda, OpenGL interoperability: cudaErrorMemoryAllocation error on cudaGraphicsGLRegisterBuffer
                            
                                How to write a range-v3 action for random_shuffle?
                            
                                numpy ctypes "dynamic module does not define init function" error if not recompiled each time

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Should load-acquire see store-release immediately?

Tags:

c++

multithreading

atomic

ixSci

People also ask

2 Answers

Tsyvarev

ixSci

Recent Activity

Donate For Us