Consider this code:
// global
std::atomic<bool> run = true;
// thread 1
while (run) { /* do stuff */ }
// thread 2
/* do stuff until it's time to shut down */
run = false;
Do I need the overhead associated with the atomic variable here? My intuition is that the read/write of a boolean variable is more or less atomic anyway (this is a common g++/Linux/Intel setup) and if there is some write/read timing weirdness, and my run loop on thread 1 stops one pass early or late as a result, I'm not super worried about it for this application.
Or is there some other consideration I am missing here? Looking at perf, it appears my code is spending a fair amount of time in std::atomic_bool::operator bool
and I'd rather have it in the loop instead.
You need to use std::atomic
to avoid undesired optimizations (compiler reading the value once and either always looping or never looping) and to get correct behavior on systems without a strongly ordered memory model (x86 is strongly ordered, so once the write finishes, the next read will see it; on other systems, if the threads don't flush CPU cache to main RAM for other reasons, the write might not be seen for a long time, if ever).
You can improve the performance though. Default use of std::atomic
uses a sequential consistency model that's overkill for a single flag value. You can speed it up by using load
/store
with an explicit (and less strict) memory ordering, so each load
isn't required to use the most paranoid mode to maintaining consistency.
For example, you could do:
// global
std::atomic<bool> run = true;
// thread 1
while (run.load(std::memory_order_acquire)) { /* do stuff */ }
// thread 2
/* do stuff until it's time to shut down */
run.store(false, std::memory_order_release);
On an x86 machine, any ordering less strict than the (default, most strict) sequential consistency ordering typically ends up doing nothing but ensuring instructions are executed in a specific order; no bus locking or the like is required, because of the strongly ordered memory model. Thus, aside from guaranteeing the value is actually read from memory, not cached to a register and reused, using atomics this way on x86 is free, and on non-x86 machines, it makes your code correct (which it otherwise wouldn't be).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With