I have just seen Herb Sutter's talk: C++ and Beyond 2012: Herb Sutter - atomic<> Weapons, 2 of 2
He shows bug in implementation of std::shared_ptr destructor:
if( control_block_ptr->refs.fetch_sub(1, memory_order_relaxed ) == 0 )
delete control_block_ptr; // B
He says, that due to memory_order_relaxed, delete can be placed before fetch_sub.
At 1:25:18 - Release doesn't keep line B below, where it should be
How that is possible? There is happens-before / sequenced-before relationship, because they are both in single thread. I might be wrong, but there is also carries-a-dependency-to between fetch_sub and delete.
If he is right, which ISO items support that?
Imagine a code that releases a shared pointer:
auto tmp = &(the_ptr->a);
*tmp = 10;
the_ptr.dec_ref();
If dec_ref() doesn't have a "release" semantic, it's perfectly fine for a compiler (or CPU) to move things from before dec_ref() to after it (for example):
auto tmp = &(the_ptr->a);
the_ptr.dec_ref();
*tmp = 10;
And this is not safe, since dec_ref() also can be called from other thread in the same time and delete the object. So, it must have a "release" semantic for things before dec_ref() to stay there.
Lets now imagine that object's destructor looks like this:
~object() {
auto xxx = a;
printf("%i\n", xxx);
}
Also we will modify example a bit and will have 2 threads:
// thread 1
auto tmp = &(the_ptr->a);
*tmp = 10;
the_ptr.dec_ref();
// thread 2
the_ptr.dec_ref();
Then, the "aggregated" code will look like:
// thread 1
auto tmp = &(the_ptr->a);
*tmp = 10;
{ // the_ptr.dec_ref();
if (0 == atomic_sub(...)) {
{ //~object()
auto xxx = a;
printf("%i\n", xxx);
}
}
}
// thread 2
{ // the_ptr.dec_ref();
if (0 == atomic_sub(...)) {
{ //~object()
auto xxx = a;
printf("%i\n", xxx);
}
}
}
However, if we only have a "release" semantic for atomic_sub(), this code can be optimized that way:
// thread 2
auto xxx = the_ptr->a; // "auto xxx = a;" from destructor moved here
{ // the_ptr.dec_ref();
if (0 == atomic_sub(...)) {
{ //~object()
printf("%i\n", xxx);
}
}
}
But that way, destructor will not always print the last value of "a" (this code is not race free anymore). That's why we also need acquire semantic for atomic_sub (or, strictly speaking, we need an acquire barrier when counter becomes 0 after decrement).
Looks like he is talking about synchronization of actions on shared object itself, which are not shown on his code blocks (and as the result - confusing).
That's why he put acq_rel
- because all actions on the object should happens before its destruction, all in order.
But I'm still not sure why he talks about swapping delete
with fetch_sub
.
This is a late reply.
Let's start out with this simple type:
struct foo
{
~foo() { std::cout << value; }
int value;
};
And we'll use this type in a shared_ptr
, as follows:
void runs_in_separate_thread(std::shared_ptr<foo> my_ptr)
{
my_ptr->value = 5;
my_ptr.reset();
}
int main()
{
std::shared_ptr<foo> my_ptr(new foo);
std::async(std::launch::async, runs_in_separate_thread, my_ptr);
my_ptr.reset();
}
Two threads will be running in parallel, both sharing ownership of a foo
object.
With a correct shared_ptr
implementation
(that is, one with memory_order_acq_rel
), this program has defined behavior.
The only value that this program will print is 5
.
With an incorrect implementation (using memory_order_relaxed
) there
are no such guarantees. The behavior is undefined because a data race of
foo::value
is introduced. The trouble occurs only for cases when the destructor
gets called in the main thread. With a relaxed memory order, the write
to foo::value
in the other thread may not propagate to the destructor in the main thread.
A value other than 5
could be printed.
So what's a data race? Well, check out the definition and pay attention to the last bullet point:
When an evaluation of an expression writes to a memory location and another evaluation reads or modifies the same memory location, the expressions are said to conflict. A program that has two conflicting evaluations has a data race unless either
- both conflicting evaluations are atomic operations (see std::atomic)
- one of the conflicting evaluations happens-before another (see std::memory_order)
In our program, one thread will write to foo::value
and one thread will
read from foo::value
. These are supposed to be sequential; the write
to foo::value
should always happen before the read. Intuitively, it
makes sense that they would be as the destructor is supposed to be the last
thing that happens to an object.
memory_order_relaxed
does not offer such ordering guarantees though and so memory_order_acq_rel
is required.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With