Is there any wording in the standard that guarantees that relaxed stores to atomics won't be lifted above the locking of a mutex? If not, is there any wording that explicitly says that it's kosher for the compiler or CPU to do so? For example, take the following program (which could potentially use acq/rel for <code>foo_has_been_set</code> and avoid the lock, and/or make <code>foo</code> itself atomic. It's written this way to illustrate this question.) <pre class="prettyprint"><code>std::mutex mu; int foo = 0; // Guarded by mu std::atomic<bool> foo_has_been_set{false}; void SetFoo() { mu.lock(); foo = 1; foo_has_been_set.store(true, std::memory_order_relaxed); mu.unlock(); } void CheckFoo() { if (foo_has_been_set.load(std::memory_order_relaxed)) { mu.lock(); assert(foo == 1); mu.unlock(); } } </code></pre> Is it possible for <code>CheckFoo</code> to crash in the above program if another thread is calling <code>SetFoo</code> concurrently, or is there some guarantee that the store to <code>foo_has_been_set</code> can't be lifted above the call to <code>mu.lock</code> by the compiler and CPU? This is related to an older question, but it's not 100% clear to me that the answer there applies to this. In particular, the counter-example in that question's answer may apply to two concurrent calls to <code>SetFoo</code>, but I'm interested in the case where the compiler knows that there is one call to <code>SetFoo</code> and one call to <code>CheckFoo</code>. Is that guaranteed to be safe? I'm looking for specific citations in the standard.

I think I've figured out the particular partial order edges that guarantee the program can't crash. In the answer below I'm referencing <a href="https://timsong-cpp.github.io/cppwp/n4659/" rel="noreferrer">version N4659</a> of the draft standard. The code involved for the writer thread A and reader thread B is: <pre class="prettyprint"><code>A1: mu.lock() A2: foo = 1 A3: foo_has_been_set.store(relaxed) A4: mu.unlock() B1: foo_has_been_set.load(relaxed) <-- (stop if false) B2: mu.lock() B3: assert(foo == 1) B4: mu.unlock() </code></pre> We seek a proof that if B3 executes, then A2 happens before B3, as defined in [intro.races]/10. By [intro.races]/10.2, it's sufficient to prove that A2 inter-thread happens before B3. Because lock and unlock operations on a given mutex happen in a single total order ([thread.mutex.requirements.mutex]/5), we must have either A1 or B2 coming first. The two cases: <ol> <li>Assume that A1 happens before B2. Then by [thread.mutex.class]/1 and [thread.mutex.requirements.mutex]/25, we know that A4 will synchronize with B2. Therefore by [intro.races]/9.1, A4 inter-thread happens before B2. Since B2 is sequenced before B3, by [intro.races]/9.3.1 we know that A4 inter-thread happens before B3. Since A2 is sequenced before A4, by [intro.races]/9.3.2, A2 inter-thread happens before B3.</li> <li>Assume that B2 happens before A1. Then by the same logic as above, we know that B4 synchronizes with A1. So since A1 is sequenced before A3, by [intro.races]/9.3.1, B4 inter-thread happens before A3. Therefore since B1 is sequenced before B4, by [intro.races]/9.3.2, B1 inter-thread happens before A3. Therefore by [intro.races]/10.2, B1 happens before A3. But then according to [intro.races]/16, B1 must take its value from the pre-A3 state. Therefore the load will return false, and B2 will never run in the first place. In other words, this case can't happen.</li> </ol> So if B3 executes at all (case 1), A2 happens before B3 and the assert will pass. ∎

C++ standard: can relaxed atomic stores be lifted above a mutex lock?

Tags:

c++

memory-model

mutex

stdatomic

relaxed-atomics

Is there any wording in the standard that guarantees that relaxed stores to atomics won't be lifted above the locking of a mutex? If not, is there any wording that explicitly says that it's kosher for the compiler or CPU to do so?

For example, take the following program (which could potentially use acq/rel for foo_has_been_set and avoid the lock, and/or make foo itself atomic. It's written this way to illustrate this question.)

std::mutex mu;
int foo = 0;  // Guarded by mu
std::atomic<bool> foo_has_been_set{false};

void SetFoo() {
  mu.lock();
  foo = 1;
  foo_has_been_set.store(true, std::memory_order_relaxed);
  mu.unlock();
}

void CheckFoo() {
  if (foo_has_been_set.load(std::memory_order_relaxed)) {
    mu.lock();
    assert(foo == 1);
    mu.unlock();
  }
}

Is it possible for CheckFoo to crash in the above program if another thread is calling SetFoo concurrently, or is there some guarantee that the store to foo_has_been_set can't be lifted above the call to mu.lock by the compiler and CPU?

This is related to an older question, but it's not 100% clear to me that the answer there applies to this. In particular, the counter-example in that question's answer may apply to two concurrent calls to SetFoo, but I'm interested in the case where the compiler knows that there is one call to SetFoo and one call to CheckFoo. Is that guaranteed to be safe?

I'm looking for specific citations in the standard.

560

asked Aug 03 '17 05:08

jacobsa

2 Answers

I think I've figured out the particular partial order edges that guarantee the program can't crash. In the answer below I'm referencing version N4659 of the draft standard.

The code involved for the writer thread A and reader thread B is:

A1: mu.lock()
A2: foo = 1
A3: foo_has_been_set.store(relaxed)
A4: mu.unlock()

B1: foo_has_been_set.load(relaxed) <-- (stop if false)
B2: mu.lock()
B3: assert(foo == 1)
B4: mu.unlock()

We seek a proof that if B3 executes, then A2 happens before B3, as defined in [intro.races]/10. By [intro.races]/10.2, it's sufficient to prove that A2 inter-thread happens before B3.

Because lock and unlock operations on a given mutex happen in a single total order ([thread.mutex.requirements.mutex]/5), we must have either A1 or B2 coming first. The two cases:

Assume that A1 happens before B2. Then by [thread.mutex.class]/1 and [thread.mutex.requirements.mutex]/25, we know that A4 will synchronize with B2. Therefore by [intro.races]/9.1, A4 inter-thread happens before B2. Since B2 is sequenced before B3, by [intro.races]/9.3.1 we know that A4 inter-thread happens before B3. Since A2 is sequenced before A4, by [intro.races]/9.3.2, A2 inter-thread happens before B3.
Assume that B2 happens before A1. Then by the same logic as above, we know that B4 synchronizes with A1. So since A1 is sequenced before A3, by [intro.races]/9.3.1, B4 inter-thread happens before A3. Therefore since B1 is sequenced before B4, by [intro.races]/9.3.2, B1 inter-thread happens before A3. Therefore by [intro.races]/10.2, B1 happens before A3. But then according to [intro.races]/16, B1 must take its value from the pre-A3 state. Therefore the load will return false, and B2 will never run in the first place. In other words, this case can't happen.

So if B3 executes at all (case 1), A2 happens before B3 and the assert will pass. ∎

answered Sep 27 '22 21:09

jacobsa

No memory operation inside a mutex protected region can 'escape' from that area. That applies to all memory operations, atomic and non-atomic.

In section 1.10.1:

a call that acquires a mutex will perform an acquire operation on the locations comprising the mutex Correspondingly, a call that releases the same mutex will perform a release operation on those same locations

Furthermore, in section 1.10.1.6:

All operations on a given mutex occur in a single total order. Each mutex acquisition “reads the value written” by the last mutex release.

And in 30.4.3.1

A mutex object facilitates protection against data races and allows safe synchronization of data between execution agents

This means, acquiring (locking) a mutex sets a one-way barrier that prevents operations that are sequenced after the acquire (inside the protected area) from moving up across the mutex lock.

Releasing (unlocking) a mutex sets a one-way barrier that prevents operations that are sequenced before the release (inside the protected area) from moving down across the mutex unlock.

In addition, memory operations that are released by a mutex are synchronized (visible) with another thread that acquires the same mutex.

In your example, foo_has_been_set is checked in CheckFoo.. If it reads true you know that the value 1 has been assigned to foo by SetFoo, but it is not synchronized yet. The mutex lock that follows will acquire foo, synchronization is complete and the assert cannot fire.

answered Sep 27 '22 22:09

LWimsey

Related questions
                            
                                Moving with lambdas
                            
                                boost::multi_array resize doesn't work
                            
                                Confused: instance creation of c# class in c++
                            
                                std::lower_bound slower for std::vector than std::map::find
                            
                                Specializing a variadic template template parameter on the minimum number of arguments: legal or not?
                            
                                C++ file stream open modes ambiguity
                            
                                X Error of failed request: GLXBadFBConfig
                            
                                C/C++ The purpose of static const local variable [closed]
                            
                                LNK1201 Visual C++ 2010 Large project failing to generate PDB
                            
                                Statically linking a C library with a Haskell library
                            
                                C++ char*[] to char** conversion
                            
                                Why can't I specify the calling convention for a constructor(C++)?
                            
                                c++ - fstream and ofstream
                            
                                Function parameters evaluation order: is it UB if we pass reference?
                            
                                How to signal to gtest that a test wants to skip itself
                            
                                C++, how to use a subset of all types
                            
                                How to render text with QOpenGLWidget
                            
                                Is std::declval<void>() a valid expression?
                            
                                Sequencing of function parameter destruction
                            
                                Overloaded function template disambiguation with `std::enable_if` and non-deduced context

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

C++ standard: can relaxed atomic stores be lifted above a mutex lock?

Tags:

c++

memory-model

mutex

stdatomic

relaxed-atomics

jacobsa

People also ask

2 Answers

jacobsa

LWimsey

Recent Activity

Donate For Us