How are atomic operations implemented at a hardware level?

2 Answers

Here is an article over at software.intel.com on that sheds little light on user level locks:

User level locks involve utilizing the atomic instructions of processor to atomically update a memory space. The atomic instructions involve utilizing a lock prefix on the instruction and having the destination operand assigned to a memory address. The following instructions can run atomically with a lock prefix on current Intel processors: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCH8B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG. [...] On most instructions a lock prefix must be explicitly used except for the xchg instruction where the lock prefix is implied if the instruction involves a memory address.

In the days of Intel 486 processors, the lock prefix used to assert a lock on the bus along with a large hit in performance. Starting with the Intel Pentium Pro architecture, the bus lock is transformed into a cache lock. A lock will still be asserted on the bus in the most modern architectures if the lock resides in uncacheable memory or if the lock extends beyond a cache line boundary splitting cache lines. Both of these scenarios are unlikely, so most lock prefixes will be transformed into a cache lock which is much less expensive.

So what prevents another core from accessing the memory address? The cache coherency protocol already manages access rights for cache lines. So if a core has (temporal) exclusive access rights to a cache line, no other core can access that cache line. To access that cache line the other core has to obtain access rights first, and the protocol to obtain those rights involves the current owner. In effect, the cache coherency protocol prevents other cores from accessing the cache line silently.

If the locked access is not bound to a single cache line things get more complicated. There are all kinds of nasty corner cases, like locked accesses over page boundaries, etc. Intel does not tell details and they probably use all kinds of tricks to make locks faster.

answered Sep 22 '22 05:09

Mackie Messer

An example implementation of this is LL/SC where a processor will actually have extra instructions that are used to complete atomic operations. On the memory side of it is cache coherency. One of the most popular cache coherency protocols is the MESI Protocol. .

answered Sep 22 '22 05:09

Josh

Related questions
                            
                                Why don't languages raise errors on integer overflow by default?
                            
                                What do you mean by the expressiveness of a programming language?
                            
                                What is an ideal variable naming convention for loop variables? [closed]
                            
                                Interview Question, What do they want to accomplish?
                            
                                Code Golf: Connecting the dots
                            
                                Code Golf: Happy Primes!
                            
                                git diff algorithm that does not rip functions apart? (language-aware diff)
                            
                                What is the design pattern for processing command line arguments
                            
                                Followup: Finding an accurate "distance" between colors
                            
                                What is the difference between bucket sort and radix sort?
                            
                                Is the "if" statement considered a method?
                            
                                Hardest types of bugs to track? [closed]
                            
                                How does this work? Weird Towers of Hanoi Solution
                            
                                How to read values from numbers written as words?
                            
                                When to make a method static? [closed]
                            
                                Code Golf: Hourglass
                            
                                What is ADT? (Abstract Data Type)
                            
                                Allen Holub wrote "You should never use get/set functions", is he correct? [duplicate]
                            
                                Monostate vs. Singleton
                            
                                What is a bubble sort good for? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How are atomic operations implemented at a hardware level?

Tags:

language-agnostic

x86

atomic

smp

Alexander Duchene

People also ask

2 Answers

Mackie Messer

Josh

Recent Activity

Donate For Us