Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can the MESI protocol not guarantee atomicity of CMPXCHG on x86 without the LOCK prefix?

I understand that the MESI protocol successfully guarantees the same view of memory (caches) for different cores. My question comes from the fact that during writing MESI guarantees that the cache is exclusively owned by a CPU and then atomic CMPXCHG just compares and exchanges values atomically. So why do we need to use the LOCK instruction and thus lock the cache line when we already have that guarantee from the MESI protocol?

like image 813
shota silagadze Avatar asked May 05 '19 19:05

shota silagadze


People also ask

Is x86 Cmpxchg Atomic if so why does it need lock?

On a single-CPU system, cmpxchg is atomic with respect to other threads, or any other code running on the same CPU core. (But not to "system" observers like a memory-mapped I/O device, or a device doing DMA reads of normal memory, so lock cmpxchg was relevant even on uniprocessor CPU designs).

What is the need of MESI protocol in processor organization?

The MESI protocol is a formal mechanism for controlling cache coherency using snooping techniques. Its acronym stands for modified, exclusive, shared, invalid and refers to the states that cached data can take. The MESI protocol is a formal mechanism for controlling cache coherency using snooping techniques.

Does coherence protocol affect performance positively?

Yes, implementing MESI(f) or any coherence protocol will have an impact on performance, but that impact is actually positive when compared to the case where no coherence protocol exists. In that case, every read/write would need to go to main memory (ie, your application will be 100's of times SLOWER).

What is exclusive state in MESI protocol?

In that sense the Exclusive state is an opportunistic optimization: If the CPU wants to modify a cache line that is in state S, a bus transaction is necessary to invalidate all other cached copies. State E enables modifying a cache line with no bus transaction.


1 Answers

atomic CMPXCHG just compares and exchanges values atomically

No, the cache-access hardware doesn't implement CMPXCHG as a single-cycle inherently-atomic operation. It's synthesized out of multiple uops that load and separately store.

If that's how regular CMPXCHG worked, your reasoning would be correct. But regular CMPXCHG is not atomic (for observers on other cores).


lock cmpxchg decodes to multiple uops that keep the cache-line "locked" from the load to the store, turning it into a single atomic transaction as far as any other observers in the system can see. (i.e. delay responding to MESI invalidate or share requests for that line until after the store commits). It also makes it a full memory barrier.


Without lock, CMPXCHG decodes to multiple uops that load, check for equality stuff, and then either store a new value or not according to the compare result. As far as atomicity, it's the same as add [mem], edx, which uses the ALU for addition in between load and store uops. i.e. it's not atomic, except on the same core with respect to interrupts (because interrupts can only happen at an instruction boundary).

The load and store are each separately atomic, but they aren't a single atomic RMW transaction. If another core invalidates our copy of the cache line and stores a new value between our load and our store, our store will step on the other store. And that other store will appear in the global order of operations on that cache line between our load and store, violating the definition of "atomic" = indivisible.

  • Can num++ be atomic for 'int num'? why add [mem], edx isn't atomic, and how lock works to make it atomic.
  • Is x86 CMPXCHG atomic, if so why does it need LOCK? use-cases for cmpxchg without lock: uniprocessor machines.
like image 54
Peter Cordes Avatar answered Oct 03 '22 03:10

Peter Cordes