Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does std::atomic ensure atomicity

Tags:

c++

atomic

If I have a code a = a + 1, now I understand that there are multiple CPU level operations required to execute this, but how does definining a as std::atomic<int> make these multiple transactions atomic?

Does it change the way CPU instruction are executed. I'd assume that it will have to shrink the number of instructions to 1 somehow, so that any context switching does not cause unreliable results, but how does it do that?

If the compiler can always create code like that, why not always do that?

like image 785
Kraken Avatar asked Jan 02 '23 09:01

Kraken


2 Answers

If there is an atomic instruction that can be issued (for a known possible atomic operation), then this atomic instruction is issued, otherwise it's going to be with a lock mechanism.

There is a function (C++17) that tells you if the atomic type is always lock-free or not: is_always_lock_free. Be aware that if this function returns false, at least some operations are not lock-free (not necessarily all of them). These non lock-free operations will usually be more expensive than atomic operations (themselves more expensive than traditional operations).

Not all hardware support all combinations of atomic operations, so different compiler backends will generate different solutions, sometimes with a single atomic operation, sometimes with a locking mechanism.

So it cannot always create such 1-instruction code.

like image 122
Matthieu Brucher Avatar answered Jan 09 '23 21:01

Matthieu Brucher


[B]ut how does definining a as std::atomic make these multiple transactions atomic?

It doesn't make the "multiple transactions" atomic in an arbitrary expression (e.g., it won't help in your a = a + 1 example). Rather, you need to use an operation like a++ which is guaranteed to be atomic. In that case, how it is implemented depends on the compiler and hardware, but the most common strategies are:

  • A single instruction atomic operation is used, which can automically increment the value in memory. On x86, this would be something like the lock add instruction.
  • Some type of compare-and-exchange (CAS) or load-linked store-conditional (LL-SC) loop is used to repeatedly attempt to do the increment in an atomic fashion. You can see LL-SC in action on MIPS64.
  • Finally, on platforms that don't support such operations, or where the data type is incompatible with those instructions, a lock can obtained to exclude any concurrent access while performing the operation with regular non-atomic instructions. Most mainstream platforms won't need to fall back on this for atomic types, but you can still see an example here on an old ARM compiler.

You may be able check the behavior on your compiler and hardware combination by examining the generated assembly. Sometimes this is tricky because the compiler may call into a function implemented in a runtime library, in which case you'll have to examine the source or disassembly for that function. This means that the same binary can have different implements for atomic operations on different hosts, if the runtime library implementation differs!

If the compiler can always create code like that, why not always do that?

The compiler doesn't always generate these because they are expensive at a hardware level. For example, a normal (non-atomic) addition usually takes 1 cycle or less2 on most modern CPUs1, while an atomic addition may take 15 to 100 cycles. Approaches that use CAS or LL-SC are generally even slower and require retry loops, bloating the binary size.


1 Up to perhaps a handful of cycles on some micro-controller class CPUs - but there atomic operations are often less relevant since there may not be multiple cores.

2 It depends how you measure it - an addition usually takes one cycle to complete (latency), but you can often execute more than one independent addition in the same cycle. For example, modern Intel CPUs can execute four in one cycle.

like image 30
BeeOnRope Avatar answered Jan 09 '23 19:01

BeeOnRope