What is the purpose of the x86 LOCK prefix, if the MESI protocol prevents other cores from writing to "exclusive"-ly owned data anyway?
I am getting a little confused between what LOCK provides and what MESI provides?
I understand the MESI protocol is about ensuring the cores all see a consistent state of memory, but as I understand, it also prevents cores from writing to memory which another core is already writing to?
This protocol reduces the number of main memory transactions with respect to the MSI protocol. This marks a significant improvement in performance.
For example, Intel uses the MESIF cache coherence protocol. In their implementation, when a core made a read to an invalid line, the corresponding cache will perform a cache read transaction to get the data from either other caches or memory.
In multiprocessor systems with separate caches that share a common memory, a same datum can be stored in more than one cache. A data consistency problem may occur when a data is modified in one cache only. The protocols to maintain the coherency for multiple processors are called cache-coherency protocols.
Yes, implementing MESI(f) or any coherence protocol will have an impact on performance, but that impact is actually positive when compared to the case where no coherence protocol exists. In that case, every read/write would need to go to main memory (ie, your application will be 100's of times SLOWER).
The MESI protocol makes the memory caches effectively invisible. This means that multithreaded programs don't have to worry about a core reading stale data from them or two cores writing to different parts of a cache line and getting half of one write and half of the other sent to main memory.
However, this doesn't help with read-modify-write operations such as increment, compare and swap, and so on. The MESI protocol won't stop two cores from each reading the same chunk of memory, each adding one to it, and then each writing the same value back, turning two increments into one.
On modern CPUs, the LOCK prefix locks the cache line so that the read-modify-write operation is logically atomic. These are oversimplified, but hopefully they'll give you the idea.
Unlocked increment:
Locked increment:
Notice the difference? In the unlocked increment, the cache line is only locked during the write memory operation, just like all writes. In the locked increment, the cache line is held across the entire instruction, all the way from the read operation to the write operation and including during the increment itself.
Also, some CPUs have things other than memory caches that can affect memory visibility. For example, some CPUs have a read prefetcher or a posted write buffer that can result in memory operations executing out of order. Where needed, a LOCK prefix (or equivalent functionality on other CPUs) will also do whatever needs to be done to handle memory operation ordering issues.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With