How do atomic operations work, under the hood?
Are atomic operations so-called "wait-free"?
I'm seeking for a description of the "least common divisor" of atomic operations. What do all atomic operations share?
During an atomic operation, a processor can read and write a location during the same data transmission. In this way, another input/output mechanism or processor cannot perform memory reading or writing tasks until the atomic operation has finished.
Atomic operations allow a read-modify-write non-interruptible sequence in a single instruction. The atomic instructions can perform simple arithmetic or logical operations on the specified memory location, and return the updated value to the CPU. Those instructions can make locking a shared resource easier and faster.
A task performed by a computer is said to be atomic when it is not divisible anymore: it can't be broken into smaller steps. Atomicity is an important property of multithreaded operations: since they are indivisible, there is no way for a thread to slip through an atomic operation concurrently performed by another one.
(C++11) [edit] The atomic library provides components for fine-grained atomic operations allowing for lockless concurrent programming. Each atomic operation is indivisible with regards to any other atomic operation that involves the same object. Atomic objects are free of data races.
If we're talking about atomic operations that are used by synchronization mechanism (mutexes, semaphores etc) they have to be supported by the OS on single CPU machines and by the hardware on multi CPU.
On a single CPU machine an instruction sequence can be made "atomic" in the sense that it cannot be interrupted in the middle (for e.g. the timer interrupt which gives a switch to another thread) if interrupts are shut off. This means that synchronization primitives can be written quite simply once the CPU enters kernel mode and can access the interrupt control registers.
In a multi core machine it is more complex. Then the instructions have to be truly atomic, across all CPUs. This requires all CPUs, not only the one executing the atomic instructions, to flush relevant parts of their cache to RAM. This flushing is what makes synchronization so expensive on these architectures.
The instructions themselves take the form of "Bit test and set" in one operation. This is enough to implement a simple mutex. Even if two threads on different CPU/cores are executing the test and set operation on the same time on the same address, only one will get the result that the bit was unset and is now set. That thread is the one that owns the mutex.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With