Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In C++11 threads, what guarantees does a std::mutex have about memory visibility?

I am currently trying to learn the C++11 threading api, and I am finding that the various resources don't provide an essential piece of information: how the CPU cache is handled. Modern CPUs have a cache for each core (meaning different threads may use a different cache). This means that it is possible for one thread to write a value to memory, and for another thread to not see it, even if it sees other changes the first thread also made.

Of course, any good threading API provides some way to solve this. In C++'s threading api, however, it is not clear how this works. I know that a std::mutex, for example, protects memory somehow, but it isn't clear what it does: does it clear the entire CPU cache, does it clear just the objects accessed inside the mutex from the current thread's cache, or something else?

Also, apparently, read-only access does not require a mutex, but if thread 1, and only thread 1, is continually writing to memory to modify an object, won't other threads potentiality see an outdated version of that object, thus making some sort of cache clearing necessary?

Do the atomic types simply bypass the cache and read the value from main memory using a single CPU instruction? Do they make any guarantees about the other places in memory being accessed?

How does memory access in C++11's threading api work, in the context of CPU caches?

Some questions, such as this one talk about memory fences, and a memory model, but no source seems to explain this in the context of CPU caches, which is what this question asks for.

like image 542
john01dav Avatar asked May 26 '18 02:05

john01dav


People also ask

Is std :: mutex thread safe?

It is totally safe for multiple threads to read the same variable, but std::mutex can not be locked by multiple threads simultaneously, even if those threads only want to read a value. Shared mutexes and locks allow this.

What is the role of std :: mutex?

std::mutex The mutex class is a synchronization primitive that can be used to protect shared data from being simultaneously accessed by multiple threads.

Where is mutex stored in memory?

The mutex in memory is not part of your processes memory, it's in the OS. If you have a nice class which you use to handle mutexes, it's actually only a wrapper. The mutex will not disappear when the class goes out of scope (depending on what's in your destructor).

What is thread mutex?

A mutual exclusion (mutex) is used cooperatively between threads to ensure that only one of the cooperating threads is allowed to access the data or run certain application code at a time. The word mutex is shorthand for a primitive object that provides MUTual EXclusion between threads.


2 Answers

std::mutex has release-acquire memory ordering semantics, so everything in thread A that happened-before the atomic write to the critical section from thread A's point of view must be visible to thread B before entering the critical section in thread B.

Have a read of http://en.cppreference.com/w/cpp/atomic/memory_order to get started. Another good resource is the book C++ Concurrency in Action. Having said this, when using the high level synchronization primitives, you should be able to be able to get away with ignoring most of these details unless you are curious or want to get your hands dirty.

like image 55
Preet Kukreti Avatar answered Oct 20 '22 05:10

Preet Kukreti


I think I understand what you are getting at. There are three things at play here.

  • The C++11 standard describes what happens at the language level... locking a std::mutex is a synchronization operation. The C++ standard does not describe how this works. CPU caches do not exist as far as the C++ standard is concerned.

  • The C++ implementation, at some point, puts some machine code in your application that implements a mutex lock. The engineers creating this implementation must take into account both the C++11 spec and the architecture spec.

  • The CPU itself manages the cache in such a way that to provide the semantics necessary for the C++ implementation to work.

This may be easier to understand if you look at atomics, which translate to much smaller snippets of assembly code but still provide synchronization. For example, try this one on GodBolt:

#include <atomic>

std::atomic<int> value;

int acquire() {
    return value.store(std::memory_order_acquire);
}

void release() {
    value.store(0, std::memory_order_release);
}

You can see the assembly:

acquire():
  mov eax, DWORD PTR value[rip]
  ret
release():
  mov DWORD PTR value[rip], 0
  ret
value:
  .zero 4

So on x86, there’s nothing necessary, the CPU already provides the required memory ordering semantics (although you can use an explicit mfence it’s usually implied by the operations). This is definitely not how it works on all processors, see the Power output:

acquire():
.LCF0:
0: addis 2,12,.TOC.-.LCF0@ha
  addi 2,2,.TOC.-.LCF0@l
  addis 3,2,.LANCHOR0@toc@ha # gpr load fusion, type int
  lwz 3,.LANCHOR0@toc@l(3)
  cmpw 7,3,3
  bne- 7,$+4
  isync
  extsw 3,3
  blr
  .long 0
  .byte 0,9,0,0,0,0,0,0
release():
.LCF1:
0: addis 2,12,.TOC.-.LCF1@ha
  addi 2,2,.TOC.-.LCF1@l
  lwsync
  li 9,0
  addis 10,2,.LANCHOR0@toc@ha
  stw 9,.LANCHOR0@toc@l(10)
  blr
  .long 0
  .byte 0,9,0,0,0,0,0,0
value:
  .zero 4

In here there are explicit isync instructions because the Power memory model provides fewer guarantees without them.

This is just punting things down to a lower level, however. The CPU itself manages shared caches using a technique like the MESI Protocol, which is a technique for maintaining cache coherence.

In the MESI protocol, when a core modifies a block of cache, it must flush that block from other caches. Other cores mark the block invalid, writing the contents out to main memory if necessary. This is inefficient, but necessary. For this reason you don't want to try and shove a bunch of commonly used mutexes or atomic variables in a small region of memory, because you can end up with multiple cores fighting over the same block of cache. The Wikipedia article is fairly comprehensive and has more detail than I'm writing here.

Something I'm omitting is the fact that mutexes typically require some kind of kernel-level support in order for threads to go to sleep or wake up.

like image 34
Dietrich Epp Avatar answered Oct 20 '22 06:10

Dietrich Epp