Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does std::mutex enforce cache coherence?

I have a non-atomic variable my_var and an std::mutex my_mut. I assume up to this point in the code, the programmer has followed this rule:

Each time the programmer modifies or writes to my_var, he locks and unlocks my_mut.

Assuming this, Thread1 performs the following:

my_mut.lock();
my_var.modify();
my_mut.unlock();

Here is the sequence of events I imagine in my mind:

  1. Prior to my_mut.lock();, there were possibly multiple copies of my_var in main memory and some local caches. These values do not necessarily agree, even if the programmer followed the rule.
  2. By the instruction my_mut.lock();, all writes from the previously executed my_mut critical section are visible in memory to this thread.
  3. my_var.modify(); executes.
  4. After my_mut.unlock();, there are possibly multiple copies of my_var in main memory and some local caches. These values do not necessarily agree, even if the programmer followed the rule. The value of my_var at the end of this thread will be visible to the next thread that locks my_mut, by the time it locks my_mut.

I have been having trouble finding a source that verifies that this is exactly how std::mutex should work. I consulted the C++ standard. From ISO 2013, I found this section:

[ Note: For example, a call that acquires a mutex will perform an acquire operation on the locations comprising the mutex. Correspondingly, a call that releases the same mutex will perform a release operation on those same locations. Informally, performing a release operation on A forces prior side effects on other memory locations to become visible to other threads that later perform a consume or an acquire operation on A.

Is my understanding of std::mutex correct?

like image 487
Mark Wallace Avatar asked Feb 12 '26 02:02

Mark Wallace


2 Answers

C++ operates on the relations between operations not some particular hardware terms (like cache cohesion). So C++ Standard has a happens-before relationship which roughly means that whatever happened before completed all its side-effects and therefore is visible at the moment that happened after.

And given you have an exclusive critical session to which you have entered means that whatever happens within it, happens before the next time this critical section is entered. So any consequential entering to it will see everything happened before. That's what the Standard mandates. Everything else (including the cache cohesion) is the implementation's duty: it has to make sure that the described behavior is coherent with what actually happens.

like image 88
ixSci Avatar answered Feb 13 '26 15:02

ixSci


After my_mut.unlock();, there are possibly multiple copies of my_var in main memory and some local caches. These values do not necessarily agree, ...

Hardware already maintains cache coherence so conflicting copies in different caches are impossible on real-world systems. AFAIK, there are no C++ implementations that run std::thread across cores without coherent caches, and it's unlikely to be a thing in the future. There are heterogenous systems like ARM DSP + MCU, but you don't run threads of one program between such cores. (And you don't boot a single OS across such cores.)

There will be a value in DRAM for the address, but all CPU cores access memory through cache so that value doesn't matter: a Modified copy in another core's cache will take priority, thanks to hardware cache coherence.

See also

  • https://en.wikipedia.org/wiki/MESI_protocol the standard cache-coherency protocol. Modern CPUs don't use a shared bus, though, they use a directory (e.g. L3 tags) to keep track of which core might have a modified copy of any given line, so they know which core to signal to write-back a line a Read For Ownership (write miss) or share-request (read miss) happens for a line.
  • When to use volatile with multi threading? (Never, except Linux kernel code which does roll its own memory_order_relaxed ops with volatile on GCC and Clang, with inline asm for more ordering when needed. But cache-coherent hardware is why just volatile does work a lot like atomic with relaxed.)
  • Is cache coherency required for memory consistency? including discussion in comments - implementing C++'s coherency requirements with manual flushing would be very onerous, e.g. every release store would have to know what parts of cache to flush, but the compiler normally doesn't know which variables are shared or not. And worse, dirty write-back caches would need to get written back before writes from other cores so our later loads can actually see them.
  • http://eel.is/c++draft/intro.races#19 - [Note 19: The four preceding coherence requirements effectively disallow compiler reordering of atomic operations to a single object, even if both operations are relaxed loads. This effectively makes the cache coherence guarantee provided by most hardware available to C++ atomic operations. — end note]

Programs running on cores with non-coherent shared memory can use it for message-passing, e.g. via MPI, where the program is explicit about which memory regions are flushed when. C++'s multithreaded memory model is not suitable for such systems. That's why mainstream multi-CPU systems are ccNUMA; non-coherent shared memory can be found between nodes of a cluster, but again that's where you'd use MPI or something, not C++ threads across separate instances of an OS running on separate nodes.

like image 33
Peter Cordes Avatar answered Feb 13 '26 17:02

Peter Cordes