Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How are atomic operations implemented at a hardware level?

I get that at the assembly language level instruction set architectures provide compare and swap and similar operations. However, I don't understand how the chip is able to provide these guarantees.

As I imagine it, the execution of the instruction must

  1. Fetch a value from memory
  2. Compare the value
  3. Depending on the comparison, possibly store another value in memory

What prevents another core from accessing the memory address after the first has fetched it but before it sets the new value? Does the memory controller manage this?

edit: If the x86 implementation is secret, I'd be happy to hear how any processor family implements it.

like image 378
Alexander Duchene Avatar asked Feb 07 '13 18:02

Alexander Duchene


People also ask

How do atomic operations work?

During an atomic operation, a processor can read and write a location during the same data transmission. In this way, another input/output mechanism or processor cannot perform memory reading or writing tasks until the atomic operation has finished.

What is an atomic operation example?

An example of atomic operation is instruction execution, usually an instruction feed to the execution unit can't be stopped in the middle. Yet, a statement in high level language results in multiple instructions. It is the root cause of non-atomic operations.

What is atomic in operating systems?

In computer programming, atomic describes a unitary action or object that is essentially indivisible, unchangeable, whole, and irreducible.

What is atomic operation in microcontroller?

In computing, an atomic instruction/operation means that which cannot/should not be interrupted (its lower-level steps be separated) while being executed, or there is risk of unwanted side effects. Interrupt disabling is the most crude way to force a series of instructions to behave almost as if they were 1.


2 Answers

Here is an article over at software.intel.com on that sheds little light on user level locks:

User level locks involve utilizing the atomic instructions of processor to atomically update a memory space. The atomic instructions involve utilizing a lock prefix on the instruction and having the destination operand assigned to a memory address. The following instructions can run atomically with a lock prefix on current Intel processors: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCH8B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG. [...] On most instructions a lock prefix must be explicitly used except for the xchg instruction where the lock prefix is implied if the instruction involves a memory address.

In the days of Intel 486 processors, the lock prefix used to assert a lock on the bus along with a large hit in performance. Starting with the Intel Pentium Pro architecture, the bus lock is transformed into a cache lock. A lock will still be asserted on the bus in the most modern architectures if the lock resides in uncacheable memory or if the lock extends beyond a cache line boundary splitting cache lines. Both of these scenarios are unlikely, so most lock prefixes will be transformed into a cache lock which is much less expensive.

So what prevents another core from accessing the memory address? The cache coherency protocol already manages access rights for cache lines. So if a core has (temporal) exclusive access rights to a cache line, no other core can access that cache line. To access that cache line the other core has to obtain access rights first, and the protocol to obtain those rights involves the current owner. In effect, the cache coherency protocol prevents other cores from accessing the cache line silently.

If the locked access is not bound to a single cache line things get more complicated. There are all kinds of nasty corner cases, like locked accesses over page boundaries, etc. Intel does not tell details and they probably use all kinds of tricks to make locks faster.

like image 89
Mackie Messer Avatar answered Sep 22 '22 05:09

Mackie Messer


An example implementation of this is LL/SC where a processor will actually have extra instructions that are used to complete atomic operations. On the memory side of it is cache coherency. One of the most popular cache coherency protocols is the MESI Protocol. .

like image 26
Josh Avatar answered Sep 22 '22 05:09

Josh