Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why 'acquire/release' can not guarantee sequential consistency in c++11?

-Thread 1-                
y.store (20, memory_order_release); 
x.store (10, memory_order_release);

-Thread 2-
if (x.load(memory_order_acquire) == 10) {
  assert (y.load(memory_order_acquire) == 20);
  y.store (10, memory_order_release)
}

-Thread 3-
if (y.load(memory_order_acquire) == 10) {
  assert (x.load(memory_order_acquire) == 10);
}

GCC Atomic Wiki paragraph “Overall Summary” says, the above code assert(x.load(memory_order_acquire)) can fail. But I don't understand why ?

My understanding is:

  1. Thread3 can not LoadLoad reorder due to acquire barrier.
  2. Thread1 can not StoreStore reorder due to release barrier.
  3. When Thread2 read(x)->10, x must be flushed from storebuffer to cache in Thread1, so every thread know the value x has changed, such as invalidate cache line.
  4. Thread3 uses Acquire barrier, so it can see x(10).
like image 379
Pengcheng Avatar asked May 21 '18 01:05

Pengcheng


1 Answers

This is a bad example, though it does illustrate how mind-warping relaxed atomics can be, I suppose.

[intro.execution]p9:

Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated.

[atomics.order]p2:

An atomic operation A that performs a release operation on an atomic object M synchronizes with an atomic operation B that performs an acquire operation on M and takes its value from any side effect in the release sequence headed by A.

As a result, the evaluations shown are chained together by sequenced-before and synchronized-with relationships:

Thread 1                   Thread 2              Thread 3

y.store(20)
   |
   | s.b.
   V           s.w.
x.store(10)  -------->  x.load() == 10
                               |
                               | s.b.
                               V      s.w.
                        y.store(10) --------> y.load() == 10
                                                  |
                                                  | s.b.
                                                  V
                                              x.load() == ?

And so each evaluation in the chain happens before the next (see [intro.races]p9-10).

[intro.races]p15,

If a value computation A of an atomic object M happens before a value computation B of M, and A takes its value from a side effect X on M, then the value computed by B shall either be the value stored by X or the value stored by a side effect Y on M, where Y follows X in the modification order of M.

Here, A is the load in Thread 2 that took the value 10, B is the load in Thread 3 (in the assert). Since A happens before B, and there are no other side effects on x, B must also read 10.


Herb Sutter has a much simpler example on his blog:

T1: x = 1;
T2: y = 1;
T3: if( x == 1 && y == 0 ) puts("x first");
T4: if( y == 1 && x == 0 ) puts("y first");

You absolutely need sequential consistency to guarantee that at most one line is printed.

like image 121
T.C. Avatar answered Oct 21 '22 20:10

T.C.