If I have a code a = a + 1
, now I understand that there are multiple CPU level operations required to execute this, but how does definining a
as std::atomic<int>
make these multiple transactions atomic?
Does it change the way CPU instruction are executed. I'd assume that it will have to shrink the number of instructions to 1 somehow, so that any context switching does not cause unreliable results, but how does it do that?
If the compiler can always create code like that, why not always do that?
If there is an atomic instruction that can be issued (for a known possible atomic operation), then this atomic instruction is issued, otherwise it's going to be with a lock mechanism.
There is a function (C++17) that tells you if the atomic type is always lock-free or not: is_always_lock_free
.
Be aware that if this function returns false
, at least some operations are not lock-free (not necessarily all of them). These non lock-free operations will usually be more expensive than atomic operations (themselves more expensive than traditional operations).
Not all hardware support all combinations of atomic operations, so different compiler backends will generate different solutions, sometimes with a single atomic operation, sometimes with a locking mechanism.
So it cannot always create such 1-instruction code.
[B]ut how does definining a as std::atomic make these multiple transactions atomic?
It doesn't make the "multiple transactions" atomic in an arbitrary expression (e.g., it won't help in your a = a + 1
example). Rather, you need to use an operation like a++
which is guaranteed to be atomic. In that case, how it is implemented depends on the compiler and hardware, but the most common strategies are:
lock add
instruction.You may be able check the behavior on your compiler and hardware combination by examining the generated assembly. Sometimes this is tricky because the compiler may call into a function implemented in a runtime library, in which case you'll have to examine the source or disassembly for that function. This means that the same binary can have different implements for atomic operations on different hosts, if the runtime library implementation differs!
If the compiler can always create code like that, why not always do that?
The compiler doesn't always generate these because they are expensive at a hardware level. For example, a normal (non-atomic) addition usually takes 1 cycle or less2 on most modern CPUs1, while an atomic addition may take 15 to 100 cycles. Approaches that use CAS or LL-SC are generally even slower and require retry loops, bloating the binary size.
1 Up to perhaps a handful of cycles on some micro-controller class CPUs - but there atomic operations are often less relevant since there may not be multiple cores.
2 It depends how you measure it - an addition usually takes one cycle to complete (latency), but you can often execute more than one independent addition in the same cycle. For example, modern Intel CPUs can execute four in one cycle.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With