As described in other posts, without any sort of volatile or std::atomic qualification, the compiler and/or processor is at liberty to re-order the sequence of statements (e.g. assignments):
// this code
int a = 2;
int b = 3;
int c = a;
b = c;
// could be re-ordered/re-written as the following by the compiler/processor
int c = 2;
int a = c;
int b = a;
However, are conditionals and control statements (e.g. if, while, for, switch, goto) also allowed to be re-ordered, or are they considered essentially a "memory fence"?
int* a = &previously_defined_int;
int b = *a;
if (a == some_predefined_ptr)
{
a = some_other_predefined_ptr; // is this guaranteed to occur after "int b = *a"?
}
If the above statements could be re-ordered (e.g. store a in a temporary register, update a, then populate b by de-reference the "old" a in the temporary register), which I guess they could be while still satisfying the same "abstract machine" behavior in a single-threaded environment, then why aren't there problems when using locks/mutexes?
bool volatile locked = false; // assume on given implementation, "locked" reads/writes occur in 1 CPU instruction
// volatile so that compiler doesn't optimize out
void thread1(void)
{
while (locked) {}
locked = true;
// do thread1 things // couldn't these be re-ordered in front of "locked = true"?
locked = false;
}
void thread2(void)
{
while (locked) {}
locked = true;
// do thread2 things // couldn't these be re-ordered in front of "locked = true"?
locked = false;
}
Even if std::atomic was used, non-atomic statements can still be re-ordered around atomic statements, so that wouldn't help ensure that "critical section" statements (i.e. the "do threadX things") were contained within their intended critical section (i.e. between the locking/unlocking).
Edit: Actually, I realize the lock example doesn't actually have anything to do with the conditional/control statement question I asked. It would still be nice to have clarification on both of the questions asked though:
The "as-if" rule ([intro.abstract]) is important to note here:
The semantic descriptions in this document define a parameterized nondeterministic abstract machine. This document places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below
Anything* can be reordered so long as the implementation can guarantee that the observable behavior of the resulting program is unchanged.
Thread synchronization constructs in general cannot be implemented properly without fences and preventing reordering. For example, the standard guarantees that lock/unlock operations on mutexes shall behave as atomic operations. Atomics also explicitly introduce fences, especially with respect to the memory_order specified. This means that statements depending on an (unrelaxed) atomic operation cannot be reordered, otherwise the observable behavior of the program may change.
[intro.races] talks a great deal about data races and ordering.
Branches are very much the opposite of a memory fence in assembly. Speculative execution + out-of-order exec means that control dependencies are not data dependencies, so for example if(x) tmp = y; can load y without waiting for a cache miss on x. See memory model, how load acquire semantic actually works?
Of course in C++, this just means that no, if() doesn't help. The definitions of (essentially deprecated) memory_order_consume might even specify that if isn't a data dependency. Real compilers promote it to acquire because it's too hard to implement as originally specified, though.
So TL:DR: you still need atomics with mo_acquire and mo_release (or stronger) if you want to establish a happens-before between two threads. Using a relaxed variable in an if() doesn't help at all, and in fact makes reordering in practice on real CPUs easier.
And of course non-atomic variables aren't safe without synchronization. An if(data_ready.load(acquire)) is sufficient to protect access to a non-atomic variable, though. So is a mutex; mutex lock/unlock count as acquire and release operations on the mutex object, according to C++ definitions. (Many practical implementations involve full barriers, but formally C++ only guarantees acq and rel for mutexes)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With