There is a popular spin-lock mutex version which is spreaded across the Internet and which one might encounter in the Anthony Williams book(C++ Concurrency in Action). Here it is: <pre class="prettyprint"><code>class SpinLock { std::atomic_flag locked; public: SpinLock() : locked{ATOMIC_FLAG_INIT} { } void lock() { while(locked.test_and_set(std::memory_order_acquire)); } void unlock() { locked.clear(std::memory_order_release); } }; </code></pre> The thing I do not understand is why everybody uses <code>std::memory_order_acquire</code> for the <code>test_and_set</code> which is an RMW operation. Why is it not <code>std::memory_acq_rel</code>? Suppose we have 2 threads simultaneously trying to acquire a lock: <pre class="prettyprint"><code>T1: test_and_set -> ret false T2: test_and_set -> ret false </code></pre> The situation should be possible since we have 2 <code>acquire</code> operations which don't form any <code>sync with</code> relationship between each other. Yes, after we have unlocked the mutex we have a <code>release</code> operation which heads a subsequent <code>release sequence</code> and life becomes colorful and everyone is happy. But why is it safe before the <code>release sequence</code> is headed? Since many people mention exactly that implementation I suppose it should work correctly. So what am I missing? UPDATE 1: I perfectly understand that the operation is atomic, that operations between <code>lock</code> and <code>unlock</code> can't go out of the critical section. This is not a problem. The problem is that I don't see how the code above prevents 2 mutexes coming into the critical section simultaneously. To prevent it from happening there should be happens before relationship between 2 <code>lock</code>s. Could someone show me, using the C++ standard notions, that the code is perfectly safe? UPDATE 2: Ok, we are close to the correct answer, I believe. I've found the following in the standard: [atomics.order] clause 11 <blockquote> Atomic read-modify-write operations shall always read the last value (in the modification order) written before the write associated with the read-modify-write operation. </blockquote> And on this major note I could happily close the question but I still have my doubts. What about <code>in the modification order</code> part? Standard is pretty clear about it: [intro.multithread] clause 8 <blockquote> All modifications to a particular atomic object M occur in some particular total order, called the modification order of M . If A and B are modifications of an atomic object M and A happens before(as defined below) B, then A shall precede B in the modification order of M , which is defined below. </blockquote> So according to that clause for an RMW operation to have the latest written value, the latest write operation should happen before the reading part or RMW operation. Which is not the case in the question. Right? UPDATE 3: I more and more think that the code for a spin lock is broken. Here is my reasoning. C++ specify 3 types of operations: <ul> <li>Acquire, release, acquire-release - these are sync ops.</li> <li>Relaxed - these are no sync ops</li> <li>RMW - these are operations with "special" characteristic</li> </ul> Let's start with RMW and find out what so special about them. First, they are a valuable asset in forming <code>release sequence</code>, second they have a special clause([atomics.order] clause 11) cited above. Nothing else special I found. Acquire/release are sync ops and <code>release sync with acquire</code> so forming a <code>happens before</code> relationship. Relaxed operations are just plain atomics and don't participate in the modification order at all. What we have in our code? We have an RMW operation which uses acquire memory semantics so whenever first unlock(release) is reached it serves 2 roles: <ol> <li>It forms a <code>sync with</code> relationship with the previous <code>release</code> </li> <li>It participates in the <code>release sequence</code>. But that's all true only after the first <code>unlock</code> has finished.</li> </ol> Before that, if we have 2+ threads which are simultaneously running our <code>lock</code> code then we can enter pass the <code>lock</code> simultaneously since 2 <code>acquire</code> operations don't form any kind of relationship. They are as unordered as relaxed operations would. Since they are unordered we can't use any special clauses about RMW operations since there is no <code>happens before</code> relationship and hence no modification order for the <code>locked</code> flag. So either my logic is flawed or code is broken. Please, whoever knows the truth - comment on this.

<blockquote> Could someone show me, using the C++ standard notions, that the code is perfectly safe? </blockquote> I initially had the same concerns as you. I think the key is understanding that operations on the <code>std::atomic_flag</code> variable are atomic with respect to all processors/cores. Two atomic 'test and set' operations in separate threads cannot simultaneously succeed, regardless of the memory ordering specified, since they then could not be atomic; the operation must apply to the actual variable, not a cached local copy (which is, I think, not even a concept in C++). A full chain of reasoning: 29.7 p5 (talking about the test-and-set operation): <blockquote> Effects: Atomically sets the value pointed to by object or by this to true. Memory is affected according to the value of order. These operations are atomic read-modify-write operations (1.10). Returns: Atomically, the value of the object immediately before the effects. </blockquote> 1.10 p6: <blockquote> All modifications to a particular atomic object M occur in some particular total order, called the modification order of M ... </blockquote> So, if as this case two threads attempt to lock the spinlock at the same time, one of them must be first and the other one second. We now just need to show that the second one will by necessity return that the flag was already set, thus preventing that thread from entering the critical section. Paragraph 6 goes on to say: <blockquote> ... If A and B are modifications of an atomic object M and A happens before (as defined below) B, then A shall precede B in the modification order of M , which is defined below. [ Note: This states that the modification orders must respect the “happens before” relationship. — end note ] </blockquote> There is no "happens before" relationship between the two test-and-set operations happening in the two threads, so we cannot determine which comes first in the modification order; however, due to the first sentence in p6 (which states that there is a total ordering), one must definitely come before the other. Now, from 29.3 p12: <blockquote> Atomic read-modify-write operations shall always read the last value (in the modification order) written before the write associated with the read-modify-write operation. </blockquote> This shows that the test-and-set ordered second must see the value written by the test-and-set ordered first. Any acquire/release choices do not affect this. Therefore, if two test-and-set operations are performed "simultaneously", they will be given an arbitrary order, and the second shall see the flag value that was set by the first. As such the memory order constraints specified for the test-and-set operation do not matter; they are used to control ordering of writes to other variables during the period where the spinlock is acquired. Response to "Update 2" of the question: <blockquote> So according to that clause for an RMW operation to have the latest written value, the latest write operation should happen before the reading part or RMW operation. Which is not the case in the question. Right? </blockquote> Correct that there is no "happen before" relationship, but incorrect that an RMW operation needs such a relationship in order to be guaranteed the latest written value. The statement you list as "[atomics.order] clause 11" does not require a "happens before" relationship, just that one operation is before the other in the "modification order" for the atomic flag. Clause 8 states that there will be such an order, and it will be a total ordering: <blockquote> All modifications to a particular atomic object M occur in some particular total order, called the modification order of M ... </blockquote> ... it then goes on to say that the total ordering must be consistent with any "happens before" relationships: <blockquote> ... If A and B are modifications of an atomic object M and A happens before (as defined below) B, then A shall precede B in the modification order of M, which is defined below. </blockquote> However, in the absence of a "happens before" relationship, there is still a total ordering - it's just that this ordering has a degree of arbitrariness. That is, if there is no "happens before" relationship between A and B, then it is not specified whether A is ordered before or after B. But it must be one or the other, because there is a particular total order. Why is memory_order_acquire needed, then? A mutex such as a spinlock is often used to protect other, non-atomic variables and data structures. Using <code>memory_order_acquire</code> when locking the spinlock assures that a read from such variables will see the correct values (i.e. the values written by any other thread that previously held the spinlock). For the unlock, <code>memory_order_release</code> is also needed in order to allow other threads to see the written values. The acquire/release both prevent the compiler from re-ordering reads/writes past the acquisition/release of the lock, and ensure that any necessary instructions to ensure appropriate levels of cache coherency are generated. Further evidence: First, this note from 29.3: <blockquote> Note: Atomic operations specifying memory_order_relaxed are relaxed with respect to memory ordering. Implementations must still guarantee that any given atomic access to a particular atomic object be indivisible with respect to all other atomic accesses to that object. — end note </blockquote> This is essentially saying that the memory ordering specified does not affect the atomic operation itself. The access must "be indivisible with respect to all other atomic accesses" including those from other threads. To allow two test-and-set operations to read the same value would effectively be dividing at least one of them, so that it was no longer atomic. Also, from 1.10 paragraph 5: <blockquote> In addition, there are relaxed atomic operations, which are not synchronization operations, and atomic read-modify-write operations, which have special characteristics. </blockquote> (A test-and-set falls into this latter category) and especially: <blockquote> “Relaxed” atomic operations are not synchronization operations even though, like synchronization operations, they cannot contribute to data races. </blockquote> (emphasis mine). A case where two threads both simultaneously executed an atomic test-and-set (and both performed the 'set' part) would be such a data race, so this text again indicates that this cannot happen. 1.10 p8: <blockquote> Note: The specifications of the synchronization operations define when one reads the value written by another. For atomic objects, the definition is clear. </blockquote> It means one thread reads the value written by another. It says that for atomic objects the definition is clear, meaning that no other synchronisation is necessary - it is enough to perform the operation on the atomic object; the effect will be seen immediately by other threads. In particular, 1.10 p19: <blockquote> [ Note: The four preceding coherence requirements effectively disallow compiler reordering of atomic operations to a single object, even if both operations are relaxed loads. This effectively makes the cache coherence guarantee provided by most hardware available to C ++ atomic operations. — end note ] </blockquote> Note the mention of cache coherence even in the presence of relaxed loads. This clearly shows that the test-and-set can only succeed in one thread at a time, since for one to fail either cache coherence is broken or the operation is not atomic.

I think what you're missing is that <code>test_and_set</code> is atomic, period. There is no memory ordering setting that makes this operation not atomic. If all we needed was an atomic test and set, we could specify any memory ordering. However, in this case, we need more than just an atomic "test and set" operation. We need to ensure that memory operations we performed after we confirmed that the lock was ours to take aren't re-ordered to before we observed the mutex to be unlocked. (Because those operations won't be atomic operations.) Consider: <ol> <li>Some reads of data not protected by the mutex.</li> <li>Some writes to data not protected by the mutex.</li> <li>We try to lock the mutex.</li> <li>We see the mutex as locked and fail to lock it.</li> <li>We see the mutex as unlocked and atomically lock it.</li> <li>Some reads of data protected by the mutex.</li> <li>Some writes to data protected by the mutex.</li> </ol> What is the one thing that must not happen? It's that the reads and writes in step 6 and 7 somehow get re-ordered to before step 5, stepping on another thread accessing shared data under protection of the mutex. The <code>test_and_set</code> operation is already atomic, so steps 4 and 5 are inherently safe. And steps 1 and 2 can't modify protected data (because they occur before we even try to lock) so there's no harm in re-ordering them around our lock operation. But steps 6 and 7 -- those must not be re-ordered to prior to us observing that the lock was unlocked so that we could atomically lock it. That would be a catastrophe. The definition of memory_order_acquire: "A load operation with this memory order performs the acquire operation on the affected memory location: no memory accesses in the current thread can be reordered before this load." Exactly what we need.

Basic spin-lock mutex implementation ordering

Tags:

c++

multithreading

atomic

There is a popular spin-lock mutex version which is spreaded across the Internet and which one might encounter in the Anthony Williams book(C++ Concurrency in Action). Here it is:

class SpinLock
{
    std::atomic_flag locked;
public:
    SpinLock() :
        locked{ATOMIC_FLAG_INIT}
    {
    }
    void lock() 
    {
        while(locked.test_and_set(std::memory_order_acquire));
    }
    void unlock() 
    {
        locked.clear(std::memory_order_release);
    }
};

The thing I do not understand is why everybody uses std::memory_order_acquire for the test_and_set which is an RMW operation. Why is it not std::memory_acq_rel? Suppose we have 2 threads simultaneously trying to acquire a lock:

T1: test_and_set -> ret false
T2: test_and_set -> ret false

The situation should be possible since we have 2 acquire operations which don't form any sync with relationship between each other. Yes, after we have unlocked the mutex we have a release operation which heads a subsequent release sequence and life becomes colorful and everyone is happy. But why is it safe before the release sequence is headed?

Since many people mention exactly that implementation I suppose it should work correctly. So what am I missing?

UPDATE 1:

I perfectly understand that the operation is atomic, that operations between lock and unlock can't go out of the critical section. This is not a problem. The problem is that I don't see how the code above prevents 2 mutexes coming into the critical section simultaneously. To prevent it from happening there should be happens before relationship between 2 locks. Could someone show me, using the C++ standard notions, that the code is perfectly safe?

UPDATE 2:

Ok, we are close to the correct answer, I believe. I've found the following in the standard:

[atomics.order] clause 11

Atomic read-modify-write operations shall always read the last value (in the modification order) written before the write associated with the read-modify-write operation.

And on this major note I could happily close the question but I still have my doubts. What about in the modification order part? Standard is pretty clear about it:

[intro.multithread] clause 8

All modifications to a particular atomic object M occur in some particular total order, called the modification order of M . If A and B are modifications of an atomic object M and A happens before(as defined below) B, then A shall precede B in the modification order of M , which is defined below.

So according to that clause for an RMW operation to have the latest written value, the latest write operation should happen before the reading part or RMW operation. Which is not the case in the question. Right?

UPDATE 3:

I more and more think that the code for a spin lock is broken. Here is my reasoning. C++ specify 3 types of operations:

Acquire, release, acquire-release - these are sync ops.
Relaxed - these are no sync ops
RMW - these are operations with "special" characteristic

Let's start with RMW and find out what so special about them. First, they are a valuable asset in forming release sequence, second they have a special clause([atomics.order] clause 11) cited above. Nothing else special I found.

Acquire/release are sync ops and release sync with acquire so forming a happens before relationship. Relaxed operations are just plain atomics and don't participate in the modification order at all.

What we have in our code? We have an RMW operation which uses acquire memory semantics so whenever first unlock(release) is reached it serves 2 roles:

It forms a sync with relationship with the previous release
It participates in the release sequence. But that's all true only after the first unlock has finished.

Before that, if we have 2+ threads which are simultaneously running our lock code then we can enter pass the lock simultaneously since 2 acquire operations don't form any kind of relationship. They are as unordered as relaxed operations would. Since they are unordered we can't use any special clauses about RMW operations since there is no happens before relationship and hence no modification order for the locked flag.

So either my logic is flawed or code is broken. Please, whoever knows the truth - comment on this.

340

asked Jun 07 '15 07:06

ixSci

2 Answers

Could someone show me, using the C++ standard notions, that the code is perfectly safe?

I initially had the same concerns as you. I think the key is understanding that operations on the std::atomic_flag variable are atomic with respect to all processors/cores. Two atomic 'test and set' operations in separate threads cannot simultaneously succeed, regardless of the memory ordering specified, since they then could not be atomic; the operation must apply to the actual variable, not a cached local copy (which is, I think, not even a concept in C++).

A full chain of reasoning:

29.7 p5 (talking about the test-and-set operation):

Effects: Atomically sets the value pointed to by object or by this to true. Memory is affected according to the value of order. These operations are atomic read-modify-write operations (1.10). Returns: Atomically, the value of the object immediately before the effects.

1.10 p6:

All modifications to a particular atomic object M occur in some particular total order, called the modification order of M ...

So, if as this case two threads attempt to lock the spinlock at the same time, one of them must be first and the other one second. We now just need to show that the second one will by necessity return that the flag was already set, thus preventing that thread from entering the critical section.

Paragraph 6 goes on to say:

... If A and B are modifications of an atomic object M and A happens before (as defined below) B, then A shall precede B in the modification order of M , which is defined below. [ Note: This states that the modification orders must respect the “happens before” relationship. — end note ]

There is no "happens before" relationship between the two test-and-set operations happening in the two threads, so we cannot determine which comes first in the modification order; however, due to the first sentence in p6 (which states that there is a total ordering), one must definitely come before the other. Now, from 29.3 p12:

Atomic read-modify-write operations shall always read the last value (in the modification order) written before the write associated with the read-modify-write operation.

This shows that the test-and-set ordered second must see the value written by the test-and-set ordered first. Any acquire/release choices do not affect this.

Therefore, if two test-and-set operations are performed "simultaneously", they will be given an arbitrary order, and the second shall see the flag value that was set by the first. As such the memory order constraints specified for the test-and-set operation do not matter; they are used to control ordering of writes to other variables during the period where the spinlock is acquired.

Response to "Update 2" of the question:

So according to that clause for an RMW operation to have the latest written value, the latest write operation should happen before the reading part or RMW operation. Which is not the case in the question. Right?

Correct that there is no "happen before" relationship, but incorrect that an RMW operation needs such a relationship in order to be guaranteed the latest written value. The statement you list as "[atomics.order] clause 11" does not require a "happens before" relationship, just that one operation is before the other in the "modification order" for the atomic flag. Clause 8 states that there will be such an order, and it will be a total ordering:

All modifications to a particular atomic object M occur in some particular total order, called the modification order of M ...

... it then goes on to say that the total ordering must be consistent with any "happens before" relationships:

... If A and B are modifications of an atomic object M and A happens before (as defined below) B, then A shall precede B in the modification order of M, which is defined below.

However, in the absence of a "happens before" relationship, there is still a total ordering - it's just that this ordering has a degree of arbitrariness. That is, if there is no "happens before" relationship between A and B, then it is not specified whether A is ordered before or after B. But it must be one or the other, because there is a particular total order.

Why is memory_order_acquire needed, then?

A mutex such as a spinlock is often used to protect other, non-atomic variables and data structures. Using memory_order_acquire when locking the spinlock assures that a read from such variables will see the correct values (i.e. the values written by any other thread that previously held the spinlock). For the unlock, memory_order_release is also needed in order to allow other threads to see the written values.

The acquire/release both prevent the compiler from re-ordering reads/writes past the acquisition/release of the lock, and ensure that any necessary instructions to ensure appropriate levels of cache coherency are generated.

Further evidence:

First, this note from 29.3:

Note: Atomic operations specifying memory_order_relaxed are relaxed with respect to memory ordering. Implementations must still guarantee that any given atomic access to a particular atomic object be indivisible with respect to all other atomic accesses to that object. — end note

This is essentially saying that the memory ordering specified does not affect the atomic operation itself. The access must "be indivisible with respect to all other atomic accesses" including those from other threads. To allow two test-and-set operations to read the same value would effectively be dividing at least one of them, so that it was no longer atomic.

Also, from 1.10 paragraph 5:

In addition, there are relaxed atomic operations, which are not synchronization operations, and atomic read-modify-write operations, which have special characteristics.

(A test-and-set falls into this latter category) and especially:

“Relaxed” atomic operations are not synchronization operations even though, like synchronization operations, they cannot contribute to data races.

(emphasis mine). A case where two threads both simultaneously executed an atomic test-and-set (and both performed the 'set' part) would be such a data race, so this text again indicates that this cannot happen.

1.10 p8:

Note: The specifications of the synchronization operations define when one reads the value written by another. For atomic objects, the definition is clear.

It means one thread reads the value written by another. It says that for atomic objects the definition is clear, meaning that no other synchronisation is necessary - it is enough to perform the operation on the atomic object; the effect will be seen immediately by other threads.

In particular, 1.10 p19:

[ Note: The four preceding coherence requirements effectively disallow compiler reordering of atomic operations to a single object, even if both operations are relaxed loads. This effectively makes the cache coherence guarantee provided by most hardware available to C ++ atomic operations. — end note ]

Note the mention of cache coherence even in the presence of relaxed loads. This clearly shows that the test-and-set can only succeed in one thread at a time, since for one to fail either cache coherence is broken or the operation is not atomic.

answered Sep 17 '22 15:09

davmac

I think what you're missing is that test_and_set is atomic, period. There is no memory ordering setting that makes this operation not atomic. If all we needed was an atomic test and set, we could specify any memory ordering.

However, in this case, we need more than just an atomic "test and set" operation. We need to ensure that memory operations we performed after we confirmed that the lock was ours to take aren't re-ordered to before we observed the mutex to be unlocked. (Because those operations won't be atomic operations.)

Consider:

Some reads of data not protected by the mutex.
Some writes to data not protected by the mutex.
We try to lock the mutex.
We see the mutex as locked and fail to lock it.
We see the mutex as unlocked and atomically lock it.
Some reads of data protected by the mutex.
Some writes to data protected by the mutex.

What is the one thing that must not happen? It's that the reads and writes in step 6 and 7 somehow get re-ordered to before step 5, stepping on another thread accessing shared data under protection of the mutex.

The test_and_set operation is already atomic, so steps 4 and 5 are inherently safe. And steps 1 and 2 can't modify protected data (because they occur before we even try to lock) so there's no harm in re-ordering them around our lock operation.

But steps 6 and 7 -- those must not be re-ordered to prior to us observing that the lock was unlocked so that we could atomically lock it. That would be a catastrophe.

The definition of memory_order_acquire: "A load operation with this memory order performs the acquire operation on the affected memory location: no memory accesses in the current thread can be reordered before this load."

Exactly what we need.

177

answered Sep 21 '22 15:09

David Schwartz

Related questions
                            
                                What kind of operations might one need to do before main()
                            
                                using exec to execute a system command in a new process
                            
                                C++ cross platform build automation
                            
                                Reference to member function? [duplicate]
                            
                                C++ cout and cin buffers, and buffers in general
                            
                                Why do inline functions have external linkage by default?
                            
                                How to keep a C++ class name unmodified with Cython?
                            
                                passing const pointer by reference
                            
                                why should all iterators / iterator adaptors not-movable in C++11?
                            
                                Is there a design reason why std::set doesnt have front and back member functions?
                            
                                Return type deduction with a private member variable
                            
                                how to verify client certificates using boost::asio SSL?
                            
                                Return type of assignment operator
                            
                                base operand of ‘->’ has non-pointer type
                            
                                Possibility of COW std::string implementation in C++11
                            
                                Code compiled with profiling flag does not generate gmon.out
                            
                                How to prevent specialization of a C++ template?
                            
                                Understand the behavior of the new-handler
                            
                                Change default value of CMAKE_CXX_FLAGS_DEBUG and friends in CMake
                            
                                std::atomic<std::chrono::high_resolution_clock::time_point> can not compile

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With