From the link: What is the difference between load/store relaxed atomic and normal variable?
I was deeply impressed by this answer:
Using an atomic variable solves the problem - by using atomics all threads are guarantees to read the latest writen-value even if the memory order is relaxed.
Today, i read the the link below: https://preshing.com/20140709/the-purpose-of-memory_order_consume-in-cpp11/
atomic<int*> Guard(nullptr);
int Payload = 0;
thread1:
Payload = 42;
Guard.store(&Payload, memory_order_release);
thred2:
g = Guard.load(memory_order_consume);
if (g != nullptr)
p = *g;
QUESTION: I learned that Data Dependency prevents related instruction be reordered. But i think that is obvious for ensure the correctness of execution results. It doesn't matter whether comsume-release semantic exists or not. So i wonder comsume-release really do. Oh, maybe it uses data dependencies to prevent reordering of instructions while ensuring the visibility of Payload?
So
Is it possible to get the same correct result using memory_order_relaxed if i make that 1.preventing instruction be reordered 2.ensuring the visibility of non atomic var of Payload :
atomic<int*> Guard(nullptr);
volatile int Payload = 0; // 1.Payload is volatile now
// 2.Payload.assign and Guard.store in order for data dependency
Payload = 42;
Guard.store(&Payload, memory_order_release);
// 3.data Dependency make w/r of g/p in order
g = Guard.load(memory_order_relaxed);
if (g != nullptr)
p = *g; // 4. For 1,2,3 there are no reorder, and here, volatile Payload make the value of 42 is visable.
Additional content(because of Sneftel's anwser):
1.Payload = 42; volatile make the W/R of Payload to/from main memory but not to/from cache.So 42 will write to memory.
2.Guard.store(&Payload, any MO flag can use for writting); Guard is non-volatile as you said, but is atomic
Using an atomic variable solves the problem - by using atomics all threads are guarantees to read the latest writen-value even if the memory order is relaxed.
In fact, atomics are always thread safe, regardless of the memory order! The memory order is not for the atomics -> it's for non atomic data.
So after Guard.store performing, Guard.load (with any MO flag can use for reading) can get the address of Payload correcttly. And then get the 42 from memory correcttly.
Above code:
1.no reorder effect for data dependency .
2.no cache effect for volatile Payload
3.no thread-safe problem for atomic Guard
Can i get the correct value - 42?
Back to the main question
When you use consume semantics, you’re basically trying to make the compiler exploit data dependencies on all those processor families. That’s why, in general, it’s not enough to simply change memory_order_acquire to memory_order_consume. You must also make sure there are data dependency chains at the C++ source code level.
" You must also make sure there are data dependency chains at the C++ source code level."
I think the data dependency chains at the C++ source code level prevents instruction are reordered naturally. So What does memory_order_consume really do?
And can I use memory_order_relaxed to achieve the same result as above code?
Additional content end
memory_order_acquire: Syncs reading this atomic variable AND makes sure relaxed vars written before this are synced as well. (does this mean all atomic variables on all threads are synced?) memory_order_release: Pushes the atomic store to other threads (but only if they read the var with consume/acquire)
Relaxed ordering Atomic operations tagged memory_order_relaxed are not synchronization operations; they do not impose an order among concurrent memory accesses. They only guarantee atomicity and modification order consistency.
The memory model means that C++ code now has a standardized library to call regardless of who made the compiler and on what platform it's running. There's a standard way to control how different threads talk to the processor's memory.
First of all, memory_order_consume
is temporarily discouraged by the ISO C++ committee until they come up with something compilers can actually implement. For a few years now, compilers have treated consume
as a synonym for acquire
. See the section at the bottom of this answer.
Hardware still provides the data dependency, so it's interesting to talk about that, despite not having any safely portable ISO C++ ways to take advantage currently. (Only hacks with mo_relaxed
or hand-rolled atomics, and careful coding based on understanding of compiler optimizations and asm, kind of like you're trying to do with relaxed. But you don't need volatile.)
Oh, maybe it uses data dependencies to prevent reordering of instructions while ensuring the visibility of Payload?
Not exactly "reordering of instructions", but memory reordering. As you say, sanity and causality are enough in this case if the hardware provides dependency ordering. C++ is portable to machines that don't. (e.g DEC Alpha.)
The normal way to get visibility for Payload is via release-store in the writer, acquire load in the reader which sees the value from that release-store. https://preshing.com/20120913/acquire-and-release-semantics/. (So of course repeatedly storing the same value to a "ready_flag" or pointer doesn't let the reader figure out whether it's seeing a new or old store.)
Release / acquire creates a happens-before synchronization relationship between the threads, which guarantees visibility of everything the writer did before the release-store. (consume doesn't, that's why only the dependent loads are ordered.)
(consume
is an optimization on this: avoiding a memory barrier in the reader by letting the compiler take advantage of hardware guarantees as long as you follow some dependency rules.)
You have some misconceptions about what CPU cache is, and about what volatile
does, which I commented about under the question. A release-store makes sure earlier non-atomic assignments are visible in memory.
(Also, cache is coherent; it provides all CPUs with a shared view of memory that they can agree on. Registers are thread-private and not coherent, that's what people mean when they say a value is "cached". Registers are not CPU cache, but software can use them to hold a copy of something from memory. When to use volatile with multi threading? - never, but it does have some effects in real CPUs because they have coherent cache. It's a bad way to roll your own mo_relaxed
. See also https://software.rajivprab.com/2018/04/29/myths-programmers-believe-about-cpu-caches/)
In practice on real CPUs, memory reordering happens locally within each core; cache itself is coherent and never gets "out of sync". (Other copies are invalided before a store can become globally visible). So release
just has to make sure the local CPUs stores become globally visible (commit to L1d cache) in the right order. ISO C++ doesn't specify any of that level of detail, and an implementation that worked very differently is hypothetically possible.
Making the writer's store volatile is irrelevant in practice because a non-atomic assignment followed by a release-store already has to make everything visible to other threads that might do an acquire-load and sync with that release store. It's irrelevant on paper in pure ISO C++ because it doesn't avoid data-race UB.
(Of course, it's theoretically possible for whole-program optimization to see that there are no acquire or consume loads that would ever load this store, and optimize away the release property. But compilers currently don't optimize atomics in general even locally, and never try to do that kind of whole-program analysis. So code-gen for writer functions will assume that there might be a reader that syncs with any given store of release or seq_cst ordering.)
What does memory_order_consume really do?
One thing mo_consume
does is to make sure the compiler uses a barrier instruction on implementations where the underlying hardware doesn't provide dependency ordering naturally / for free. In practice that means only on DEC Alpha. Dependent loads reordering in CPU / Memory order consume usage in C11
Your question is a near duplicate of C++11: the difference between memory_order_relaxed and memory_order_consume - see the answers there for the body of your question about misguided attempts to do stuff with volatile and relaxed. (I'm mostly answering because of the title question.)
It also ensures that the compiler uses a barrier at some point before execution passes into code that doesn't know about the data dependency this value carries. (i.e. no [[carries_dependency]]
tag on the function arg in the declaration). Such code might replace x-x
with a constant 0
and optimize away, losing the data dependency. But code that knows about the dependency would have to use something like a sub r1, r1, r1
instruction to get a zero with a data dependency.
That can't happen for your use-case (where relaxed
will work in practice on ISAs other than Alpha), but the on-paper design of mo_consume
allowed all kinds of stuff that would require different code-gen from what compilers would normally do. This is part of what made it so hard to implement efficiently that compilers just promote it to mo_acquire
.
The other part of the problem is that it requires code to be littered with kill_dependency
and/or [[carries_dependency]]
all over the place, or you'll end up with a barrier at function boundaries anyway. These problems led the ISO C++ committee to temporarily discourage consume
.
consume
is intended to expose to software. Out-of-order exec can only reorder independent work anyway, not start a load before the load address is known, so on most CPUs enforcing dependency ordering happens for free anyway: only a few models of DEC Alpha could violate causality and effectively load data from before it had the pointer that gave it the address.And BTW:
The example code is safe with release
+ consume
regardless of volatile. It's safe on most compilers and most ISAs in practice with release
store + relaxed
load, although of course ISO C++ has nothing to say about the correctness of that code. But with the current state of compilers, that's a hack that some code makes (like the Linux kernel's RCU).
If you need that level of read-side scaling, you'll have to work outside of what ISO C++ guarantees. That means your code will have to make assumptions about how compilers work (and that you're running on a "normal" ISA that isn't DEC Alpha), which means you need to support some set of compilers (and maybe ISAs, although there aren't many multi-core ISAs around). The Linux kernel only cares about a few compilers (mostly recent GCC, also clang I think), and the ISAs that they have kernel code for.
volatile has nothing to do with multi-threading in c/c++, its sequential visibility side effect only occurs on single-thread program and usually use it only for telling compiler not optimize out this value. It is DIFFERENT from Java/C#.
release/consume is all about data dependency, and it may build a dependency chain (which can be break by kill_dependency
to avoid unnecessary barriers later).
release/acquire forms a pair-wise synchronize-with
/inter-thread happens-before
relationship.
For your case, release/acquire
would form the expected happens-before
relationship. release/consume
will also work because *g
is dependent on g
.
But note that with current compilers, consume
is treated as a synonym for acquire
, because it proved too hard to implement efficiently. see another answer
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With