I have the following C++ 2011 code:
std::atomic<bool> x, y;
std::atomic<int> z;
void f() {
x.store(true, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_release);
y.store(true, std::memory_order_relaxed);
}
void g() {
while (!y.load(std::memory_order_relaxed)) {}
std::atomic_thread_fence(std::memory_order_acquire);
if (x.load(std::memory_order_relaxed)) ++z;
}
int main() {
x = false;
y = false;
z = 0;
std::thread t1(f);
std::thread t2(g);
t1.join();
t2.join();
assert(z.load() !=0);
return 0;
}
At my computer architecture class, we've been told that the assert in this code always comes true. But after reviewing it thouroughly now, I can't really understand why it's so.
For what I know:
If my understanding is correct, why can't the following sequence of actions occur?
y.store(true, std::memory_order_relaxed);
is calledI think this complies with 'acquire'-'release' rules, but, for example in the best answer in this question: Understanding c++11 memory fences which is very similar to my case, it hints that something like step 1 in my sequence of actions cannot happen before the 'memory_order_release', but doesn't get into details for the reason behind it.
I'm terribly puzzled about this, and will be very glad if anyone could shed some light on it :)
An operation has acquire semantics if other processors will always see its effect before any subsequent operation's effect. An operation has release semantics if other processors will see every preceding operation's effect before the effect of the operation itself. Consider the following code example: C++ Copy.
In computing, a memory barrier, also known as a membar, memory fence or fence instruction, is a type of barrier instruction that causes a central processing unit (CPU) or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction.
The default is std::memory_order_seq_cst which establishes a single total ordering over all atomic operations tagged with this tag: all threads see the same order of such atomic operations and no memory_order_seq_cst atomic operations can be reordered.
memory_order_acquire. A load operation with this memory order performs the acquire operation on the affected memory location: no reads or writes in the current thread can be reordered before this load.
Exactly what happens in each of these cases depends on what processor you are actually using. For example, x86 would probably not assert on this, since it is a cache-coherent architecture (you can have race-conditions, but once a value is written out to cache/memory from the processor, all other processors will read that value - of course, doesn't stop another processor from writing a different value immediately after, etc).
So assuming this is running on an ARM or similar processor that isn't guaranteed to be cache-coherent by itself:
Because the write to x
is done before the memory_order_release
, the t2 loop will not exit the while(y...)
until x
is also true. This means that when x
is being read later on, it is guaranteed to be one, so z
is updated. My only slight query is as to if you don't need a release
for z
as well... If main
is running on a different processor than t1
and t2
, then z
may stil have a stale value in main
.
Of course, that's not GUARANTEED to happen if you have a multitasking OS (or just interrupts that do enough stuff, etc) - since if the processor that ran t1 gets its cache flushed, then t2 may well read the new value of x.
And like I said, this won't have that effect on x86 processors (AMD or Intel ones).
So, to explain barrier instructions in general (also applicable to Intel and AMD process0rs):
First, we need to understand that although instructions can start and finish out of order, the processor does have a general "understanding" of order. Let's say we have this "pseudo-machine-code":
...
mov $5, x
cmp a, b
jnz L1
mov $4, x
L1: ...
THe processor could speculatively execute mov $4, x
before it completes the "jnz L1" - so, to solve this fact, the processor would have to roll-back the mov $4, x
in the case where the jnz L1
was taken.
Likewise, if we have:
mov $1, x
wmb // "write memory barrier"
mov $1, y
the processor has rules to say "do not execute any store instruction issued AFTER wmb until all stores before it has been completed". It is a "special" instruction - it's there for the precise purpose of guaranteeing memory ordering. If it's not doing that, you have a broken processor, and someone in the design department has "his ass on the line".
Equally, the "read memory barrier" is an instruction which guarantees, by the designers of the processor, that the processor will not complete another read until we have completed the pending reads before the barrier instruction.
As long as we're not working on "experimental" processors or some skanky chip that doesn't work correctly, it WILL work that way. It's part of the definition of that instruction. Without such guarantees, it would be impossible (or at least extremely complicated and "expensive") to implement (safe) spinlocks, semaphores, mutexes, etc.
There are often also "implicit memory barriers" - that is, instructions that cause memory barriers even if they are not. Software interrupts ("INT X" instruction or similar) tend to do this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With