What is the purpose of the x86 LOCK prefix, if the MESI protocol prevents other cores from writing to "exclusive"-ly owned data anyway? I am getting a little confused between what LOCK provides and what MESI provides? I understand the MESI protocol is about ensuring the cores all see a consistent state of memory, but as I understand, it also prevents cores from writing to memory which another core is already writing to?

The MESI protocol makes the memory caches effectively invisible. This means that multithreaded programs don't have to worry about a core reading stale data from them or two cores writing to different parts of a cache line and getting half of one write and half of the other sent to main memory. However, this doesn't help with read-modify-write operations such as increment, compare and swap, and so on. The MESI protocol won't stop two cores from each reading the same chunk of memory, each adding one to it, and then each writing the same value back, turning two increments into one. On modern CPUs, the LOCK prefix locks the cache line so that the read-modify-write operation is logically atomic. These are oversimplified, but hopefully they'll give you the idea. Unlocked increment: <ol> <li>Acquire cache line, shareable is fine. Read the value.</li> <li>Add one to the read value.</li> <li>Acquire cache line exclusive (if not already E or M) and lock it.</li> <li>Write the new value to the cache line.</li> <li>Change the cache line to modified and unlock it.</li> </ol> Locked increment: <ol> <li>Acquire cache line exclusive (if not already E or M) and lock it.</li> <li>Read value.</li> <li>Add one to it.</li> <li>Write the new value to the cache line.</li> <li>Change the cache line to modified and unlock it.</li> </ol> Notice the difference? In the unlocked increment, the cache line is only locked during the write memory operation, just like all writes. In the locked increment, the cache line is held across the entire instruction, all the way from the read operation to the write operation and including during the increment itself. Also, some CPUs have things other than memory caches that can affect memory visibility. For example, some CPUs have a read prefetcher or a posted write buffer that can result in memory operations executing out of order. Where needed, a LOCK prefix (or equivalent functionality on other CPUs) will also do whatever needs to be done to handle memory operation ordering issues.

LOCK prefix vs MESI protocol?

Tags:

x86

multithreading

locking

cpu

mesi

What is the purpose of the x86 LOCK prefix, if the MESI protocol prevents other cores from writing to "exclusive"-ly owned data anyway?

I am getting a little confused between what LOCK provides and what MESI provides?

I understand the MESI protocol is about ensuring the cores all see a consistent state of memory, but as I understand, it also prevents cores from writing to memory which another core is already writing to?

420

asked Apr 26 '15 16:04

user997112

1 Answers

The MESI protocol makes the memory caches effectively invisible. This means that multithreaded programs don't have to worry about a core reading stale data from them or two cores writing to different parts of a cache line and getting half of one write and half of the other sent to main memory.

However, this doesn't help with read-modify-write operations such as increment, compare and swap, and so on. The MESI protocol won't stop two cores from each reading the same chunk of memory, each adding one to it, and then each writing the same value back, turning two increments into one.

On modern CPUs, the LOCK prefix locks the cache line so that the read-modify-write operation is logically atomic. These are oversimplified, but hopefully they'll give you the idea.

Unlocked increment:

Acquire cache line, shareable is fine. Read the value.
Add one to the read value.
Acquire cache line exclusive (if not already E or M) and lock it.
Write the new value to the cache line.
Change the cache line to modified and unlock it.

Locked increment:

Acquire cache line exclusive (if not already E or M) and lock it.
Read value.
Add one to it.
Write the new value to the cache line.
Change the cache line to modified and unlock it.

Notice the difference? In the unlocked increment, the cache line is only locked during the write memory operation, just like all writes. In the locked increment, the cache line is held across the entire instruction, all the way from the read operation to the write operation and including during the increment itself.

Also, some CPUs have things other than memory caches that can affect memory visibility. For example, some CPUs have a read prefetcher or a posted write buffer that can result in memory operations executing out of order. Where needed, a LOCK prefix (or equivalent functionality on other CPUs) will also do whatever needs to be done to handle memory operation ordering issues.

178

answered Oct 07 '22 21:10

David Schwartz

Related questions
                            
                                What important difference exists between Monitor.TryEnter(object) And Monitor.TryEnter(object, ref bool)?
                            
                                Android clear webview thread, free memory, avoid OutOfMemoryError
                            
                                Haskell: thread blocked indefinitely in an STM transaction
                            
                                sem_init(…): What is the value parameter for?
                            
                                To thread or not to thread
                            
                                Why does ThreadPoolExecutor reduce threads below corePoolSize after the keepAliveTime?
                            
                                BackgroundWorker exception handling
                            
                                How to create multiple objects in background?
                            
                                Is it possible to break on thread exit with specific error code?
                            
                                When the worker thread works, UI becomes choppy
                            
                                Stopping Logback System for Clean Shutdown
                            
                                Qt 5.1 QML property through Threads
                            
                                Concurrently accessing different members of the same object in Java
                            
                                QT: socket notifiers cannot be enabled from another thread
                            
                                Collections.synchronizedList vs Vector [duplicate]
                            
                                How to atomically check TWO AtomicBooleans in Java in one safe operation without a synchronized block (i.e. low cost locks)?
                            
                                Perl Multi Threaded Program crashes Sporadically
                            
                                numpy OpenBLAS set maximum number of threads
                            
                                Reading shared variables with relaxed ordering: is it possible in theory? Is it possible in C++?
                            
                                Multithreaded Realtime audio programming - To block or Not to block

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With