Let's take this struct:
struct entry {
atomic<bool> valid;
atomic_flag writing;
char payload[128];
}
Two treads A and B concurrently access this struct this way (let e
be an instance of entry
):
if (e.valid) {
// do something with e.payload...
} else {
while (e.writing.test_and_set(std::memory_order_acquire));
if (!e.valid) {
// write e.payload one byte at a time
// (the payload written by A may be different from the payload written by B)
e.valid = true;
e.writing.clear(std::memory_order_release);
}
}
I guess that this code is correct and does not present issues, but I want to understand why it works.
Quoting the C++ standard (29.3.13):
Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.
Now, bearing this in mind, imagine that both thread A and B enter the else
block. Is this interleave possible?
A
and B
enter the else
branch, because valid
is false
A
sets the writing
flagB
starts to spin lock on the writing
flagA
reads the valid
flag (which is false
) and enters the if
blockA
writes the payloadA
writes true
on the valid flag; obviously, if A
reads valid
again, it would read true
A
clears the writing
flagB
sets the writing
flagB
reads a stale value of the valid flag (false
) and enters the if
blockB
writes its payloadB
writes true
on the valid
flagB
clears the writing
flagI hope this is not possible but when it comes to actually answer the question "why it is not possible?", I'm not sure of the answer. Here is my idea.
Quoting from the standard again (29.3.12):
Atomic read-modify-write operations shall always read the last value (in the modification order) written before the write associated with the read-modify-write operation.
atomic_flag::test_and_set()
is an atomic read-modify-write operation, as stated in 29.7.5.
Since atomic_flag::test_and_set()
always reads a "fresh value", and I'm calling it with the std::memory_order_acquire
memory ordering, then I cannot read a stale value of the valid
flag, because I must see all the side-effects caused by A
before the atomic_flag::clear()
call (which uses std::memory_order_release
).
Am I correct?
Clarification. My whole reasoning (wrong or correct) relies on 29.3.12. For what I understood so far, if we ignore the atomic_flag
, reading stale data from valid
is possible even if it's atomic
. atomic
doesn't seem to mean "always immediately visible" to every thread. The maximum guarantee you can ask for is a consistent order in the values you read, but you can still read stale data before getting the fresh one. Fortunately, atomic_flag::test_and_set()
and every exchange
operation have this crucial feature: they always read fresh data. So, only if you acquire/release on the writing
flag (not only on valid
), then you get the expected behavior. Do you see my point (correct or not)?
EDIT: my original question included the following few lines that gained too much attention if compared to the core of the question. I leave them for consistency with the answers that have been already given, but please ignore them if you are reading the question right now.
Is there any point invalid
being anatomic<bool>
andnot a plainbool
? Moreover, if it should be anatomic<bool>
,what is its 'minimum' memory ordering constraint that will not presentissues?
Inside the else
branch valid
should be protected by the acquire/release semantics imposed by the operations on waiting
. However this does not obliviate the need to make valid
an atomic:
You forgot to include the first line (if (e.valid)
) in your analysis. If valid
was an bool
instead of atomic<bool>
this access would be completely unprotected. Therefore you could have the situation where a change of valid
becomes visible to other threads before the payload
is completely written/visible. This means that a thread B
could evaluate e.valid
to true
and enter the do something with e.payload
branch while the payload
isn't completely written yet.
Other then that your analysis seems somewhat reasonable but not entirely correct to me. The thing to remember with memory ordering is that acquire and release semantics will pair up. Everything written before a release operation can safely be read after an acquire operation on the same veriable reads the modified value. With that in mind the release semantics on waiting.clear(...)
ensure that the write to valid
must be visible when the loop on writing.test_and_set(...)
exits, since the later reads the change of waiting(the write done in
waiting.clear(...)`) with acquire semantics and doesn't exit before that change is visible.
Regarding §29.3.12: It is relevant to the correctness of your code, but unrelated to the reading a stale valid
flag. You can't set the flag before the clear, so acquire-release semantics will ensure correctness there. §29.3.12 protects you from the following scenario:
- Both A and B enter the else branch, because valid is false
- A sets the writing flag
- B sees a stale value for writing and also sets it
- Both A and B read the valid flag (which is false), enter the if block and write the payload creating a race condition
Edit: For the minimal Ordering constraints: acquire for the loads and release for the stores should probably do the job, however depending on your target hardware you might as well stay with sequential consistency. For the difference between those semantics look here.
Section 29.3.12 has nothing to do with why this code is correct or incorrect. The section you want (in the draft version of the standard available online) is Section 1.10: "Multi-threaded executions and data races." Section 1.10 defines a happens-before relation on atomic operations, and on non-atomic operations with respect to atomic operations.
Section 1.10 says that if there are two non-atomic operations where you can not determine the happens-before relationship then you have a data-race. It further declares (Paragraph 21) that any program with a data-race has undefined behavior.
If e.valid
is not atomic then you have a data race between the first line of code and the line e.valid=true
. So all of your reasoning about the behavior in the else
clause is incorrect (the program has no defined behavior so there is nothing to reason about.)
On the other hand if all of your accesses to e.valid
were protected by atomic operations on e.writing
(like if the else
clause was your whole program) then your reasoning would be correct. Event 9 in your list could not happen. But the reason is not Section 29.3.12, it is again Section 1.10, which says that your non-atomic operations will appear to be sequentially consistent if there are no dataraces.
The pattern you are using is called double checked locking. Before C++11 it was impossible to implement double checked locking portably. In C++11 you can make double checked locking work correctly and portably. The way you do it is by declaring valid
to be atomic
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With