Why can the MESI protocol not guarantee atomicity of CMPXCHG on x86 without the LOCK prefix?

Tags:

I understand that the MESI protocol successfully guarantees the same view of memory (caches) for different cores. My question comes from the fact that during writing MESI guarantees that the cache is exclusively owned by a CPU and then atomic CMPXCHG just compares and exchanges values atomically. So why do we need to use the LOCK instruction and thus lock the cache line when we already have that guarantee from the MESI protocol?

813

asked May 05 '19 19:05

shota silagadze

1 Answers

atomic CMPXCHG just compares and exchanges values atomically

No, the cache-access hardware doesn't implement CMPXCHG as a single-cycle inherently-atomic operation. It's synthesized out of multiple uops that load and separately store.

If that's how regular CMPXCHG worked, your reasoning would be correct. But regular CMPXCHG is not atomic (for observers on other cores).

lock cmpxchg decodes to multiple uops that keep the cache-line "locked" from the load to the store, turning it into a single atomic transaction as far as any other observers in the system can see. (i.e. delay responding to MESI invalidate or share requests for that line until after the store commits). It also makes it a full memory barrier.

Without lock, CMPXCHG decodes to multiple uops that load, check for equality stuff, and then either store a new value or not according to the compare result. As far as atomicity, it's the same as add [mem], edx, which uses the ALU for addition in between load and store uops. i.e. it's not atomic, except on the same core with respect to interrupts (because interrupts can only happen at an instruction boundary).

The load and store are each separately atomic, but they aren't a single atomic RMW transaction. If another core invalidates our copy of the cache line and stores a new value between our load and our store, our store will step on the other store. And that other store will appear in the global order of operations on that cache line between our load and store, violating the definition of "atomic" = indivisible.

Can num++ be atomic for 'int num'? why add [mem], edx isn't atomic, and how lock works to make it atomic.
Is x86 CMPXCHG atomic, if so why does it need LOCK? use-cases for cmpxchg without lock: uniprocessor machines.

answered Oct 03 '22 03:10

Peter Cordes

Related questions
                            
                                i386 assembly question: why do I need to meddle with the stack pointer?
                            
                                Determining program runtimes on core i5/7 architecture
                            
                                In Linux, on entry of a sys call, what is the value in %eax? (not orig_eax)
                            
                                Visual Studio 2012 native C++ DLL x86 compilation
                            
                                Could not load file or assembly 'CrystalDecisions.CrystalReports.Engine' / Windows 2012 server
                            
                                When are the carry flags set by x86 negation (NEG) / subtraction (SUB)?
                            
                                Installing amd_64 or i386 packages on raspbian (arm hf)
                            
                                Confusion in Memory segmentation in x86
                            
                                Finding missing C code, given assembly code?
                            
                                Why does Visual Studio assemble mov eax, [edx][ebx][ecx][edi] without complaint?
                            
                                Why does .NET Native compile loop in reverse order?
                            
                                How can I set or clear overflow flag in x86 assembly?
                            
                                Using 8-bit registers in x86-64 indexed addressing modes
                            
                                Why does the compiler generate a right-shift by 31 bits when dividing by 2?
                            
                                Default state of Direction Flag (DF) during x86 program execution
                            
                                PAE in x86-64 bit processors - Linux kernel
                            
                                What is the floating-point (__m256d) version of the non-temporal streaming load intrinsic (_mm256_stream_load_si256)?
                            
                                Atomically clearing lowest non-zero bit of an unsigned integer
                            
                                Committed Vs Retired instruction
                            
                                best way to shuffle across AVX lanes?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why can the MESI protocol not guarantee atomicity of CMPXCHG on x86 without the LOCK prefix?

Tags:

cpu-architecture

x86

atomic

compare-and-swap

mesi

shota silagadze

People also ask

1 Answers

Peter Cordes

Recent Activity

Donate For Us