I tried looking for details on this, I even read the standard on mutexes and atomics... but still I couldnt understand the C++11 memory model visibility guarantees. From what I understand the very important feature of mutex BESIDE mutual exclusion is ensuring visibility. Aka it is not enough that only one thread per time is increasing the counter, it is important that the thread increases the counter that was stored by the thread that was last using the mutex(I really dont know why people dont mention this more when discussing mutexes, maybe I had bad teachers :)). So from what I can tell atomic doesnt enforce immediate visibility: (from the person that maintains boost::thread and has implemented c++11 thread and mutex library):
A fence with memory_order_seq_cst does not enforce immediate visibility to other threads (and neither does an MFENCE instruction). The C++0x memory ordering constraints are just that --- ordering constraints. memory_order_seq_cst operations form a total order, but there are no restrictions on what that order is, except that it must be agreed on by all threads, and it must not violate other ordering constraints. In particular, threads may continue to see "stale" values for some time, provided they see values in an order consistent with the constraints.
And I'm OK with that. But the problem is that I have trouble understanding what C++11 constructs regarding atomic are "global" and which only ensure consistency on atomic variables. In particular I have understanding which(if any) of the following memory orderings guarantee that there will be a memory fence before and after load and stores: http://www.stdthread.co.uk/doc/headers/atomic/memory_order.html
From what I can tell std::memory_order_seq_cst inserts mem barrier while other only enforce ordering of the operations on certain memory location.
So can somebody clear this up, I presume a lot of people are gonna be making horrible bugs using std::atomic , esp if they dont use default (std::memory_order_seq_cst memory ordering)
2. if I'm right does that mean that second line is redundand in this code:
atomicVar.store(42); std::atomic_thread_fence(std::memory_order_seq_cst);
3. do std::atomic_thread_fences have same requirements as mutexes in a sense that to ensure seq consistency on nonatomic vars one must do std::atomic_thread_fence(std::memory_order_seq_cst); before load and std::atomic_thread_fence(std::memory_order_seq_cst);
after stores?
4. Is
{ regularSum+=atomicVar.load(); regularVar1++; regularVar2++; } //... { regularVar1++; regularVar2++; atomicVar.store(74656); }
equivalent to
std::mutex mtx; { std::unique_lock<std::mutex> ul(mtx); sum+=nowRegularVar; regularVar++; regularVar2++; } //.. { std::unique_lock<std::mutex> ul(mtx); regularVar1++; regularVar2++; nowRegularVar=(74656); }
I think not, but I would like to be sure.
EDIT: 5. Can assert fire?
Only two threads exist.
atomic<int*> p=nullptr;
first thread writes
{ nonatomic_p=(int*) malloc(16*1024*sizeof(int)); for(int i=0;i<16*1024;++i) nonatomic_p[i]=42; p=nonatomic; }
second thread reads
{ while (p==nullptr) { } assert(p[1234]==42);//1234-random idx in array }
There are six memory orderings that are specified in the C++ standard: memory_order_relaxed , memory_order_consume , memory_order_acquire , memory_order_release , memory_order_acq_rel and memory_order_seq_cst ³. You can specify these memory orderings with atomic operation like below. example) x.
Memory ordering describes the order of accesses to computer memory by a CPU.
Essentially memory_order_acq_rel provides read and write orderings relative to the atomic variable, while memory_order_seq_cst provides read and write ordering globally. That is, the sequentially consistent operations are visible in the same order across all threads.
The ARMv8 architecture employs a weakly-ordered model of memory. In general terms, this means that the order of memory accesses is not required to be the same as the program order for load and store operations. The processor is able to re-order memory read operations with respect to each other.
If you like to deal with fences, then a.load(memory_order_acquire)
is equivalent to a.load(memory_order_relaxed)
followed by atomic_thread_fence(memory_order_acquire)
. Similarly, a.store(x,memory_order_release)
is equivalent to a call to atomic_thread_fence(memory_order_release)
before a call to a.store(x,memory_order_relaxed)
. memory_order_consume
is a special case of memory_order_acquire
, for dependent data only. memory_order_seq_cst
is special, and forms a total order across all memory_order_seq_cst
operations. Mixed with the others it is the same as an acquire for a load, and a release for a store. memory_order_acq_rel
is for read-modify-write operations, and is equivalent to an acquire on the read part and a release on the write part of the RMW.
The use of ordering constraints on atomic operations may or may not result in actual fence instructions, depending on the hardware architecture. In some cases the compiler will generate better code if you put the ordering constraint on the atomic operation rather than using a separate fence.
On x86, loads are always acquire, and stores are always release. memory_order_seq_cst
requires stronger ordering with either an MFENCE
instruction or a LOCK
prefixed instruction (there is an implementation choice here as to whether to make the store have the stronger ordering or the load). Consequently, standalone acquire and release fences are no-ops, but atomic_thread_fence(memory_order_seq_cst)
is not (again requiring an MFENCE
or LOCK
ed instruction).
An important effect of the ordering constraints is that they order other operations.
std::atomic<bool> ready(false); int i=0; void thread_1() { i=42; ready.store(true,memory_order_release); } void thread_2() { while(!ready.load(memory_order_acquire)) std::this_thread::yield(); assert(i==42); }
thread_2
spins until it reads true
from ready
. Since the store to ready
in thread_1
is a release, and the load is an acquire then the store synchronizes-with the load, and the store to i
happens-before the load from i
in the assert, and the assert will not fire.
2) The second line in
atomicVar.store(42); std::atomic_thread_fence(std::memory_order_seq_cst);
is indeed potentially redundant, because the store to atomicVar
uses memory_order_seq_cst
by default. However, if there are other non-memory_order_seq_cst
atomic operations on this thread then the fence may have consequences. For example, it would act as a release fence for a subsequent a.store(x,memory_order_relaxed)
.
3) Fences and atomic operations do not work like mutexes. You can use them to build mutexes, but they do not work like them. You do not have to ever use atomic_thread_fence(memory_order_seq_cst)
. There is no requirement that any atomic operations are memory_order_seq_cst
, and ordering on non-atomic variables can be achieved without, as in the example above.
4) No these are not equivalent. Your snippet without the mutex lock is thus a data race and undefined behaviour.
5) No your assert cannot fire. With the default memory ordering of memory_order_seq_cst, the store and load from the atomic pointer p
work like the store and load in my example above, and the stores to the array elements are guaranteed to happen-before the reads.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With