Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Confusion about implementation error within shared_ptr destructor

I have just seen Herb Sutter's talk: C++ and Beyond 2012: Herb Sutter - atomic<> Weapons, 2 of 2

He shows bug in implementation of std::shared_ptr destructor:

if( control_block_ptr->refs.fetch_sub(1, memory_order_relaxed ) == 0 )
    delete control_block_ptr; // B

He says, that due to memory_order_relaxed, delete can be placed before fetch_sub.

At 1:25:18 - Release doesn't keep line B below, where it should be

How that is possible? There is happens-before / sequenced-before relationship, because they are both in single thread. I might be wrong, but there is also carries-a-dependency-to between fetch_sub and delete.

If he is right, which ISO items support that?

like image 320
qble Avatar asked Feb 14 '13 17:02

qble


3 Answers

Imagine a code that releases a shared pointer:

auto tmp = &(the_ptr->a);
*tmp = 10;
the_ptr.dec_ref();

If dec_ref() doesn't have a "release" semantic, it's perfectly fine for a compiler (or CPU) to move things from before dec_ref() to after it (for example):

auto tmp = &(the_ptr->a);
the_ptr.dec_ref();
*tmp = 10;

And this is not safe, since dec_ref() also can be called from other thread in the same time and delete the object. So, it must have a "release" semantic for things before dec_ref() to stay there.

Lets now imagine that object's destructor looks like this:

~object() {
    auto xxx = a;
    printf("%i\n", xxx);
}

Also we will modify example a bit and will have 2 threads:

// thread 1
auto tmp = &(the_ptr->a);
*tmp = 10;
the_ptr.dec_ref();

// thread 2
the_ptr.dec_ref();

Then, the "aggregated" code will look like:

// thread 1
auto tmp = &(the_ptr->a);
*tmp = 10;
{ // the_ptr.dec_ref();
    if (0 == atomic_sub(...)) {
        { //~object()
            auto xxx = a;
            printf("%i\n", xxx);
        }
    }
}

// thread 2
{ // the_ptr.dec_ref();
    if (0 == atomic_sub(...)) {
        { //~object()
            auto xxx = a;
            printf("%i\n", xxx);
        }
    }
}

However, if we only have a "release" semantic for atomic_sub(), this code can be optimized that way:

// thread 2
auto xxx = the_ptr->a; // "auto xxx = a;" from destructor moved here
{ // the_ptr.dec_ref();
    if (0 == atomic_sub(...)) {
        { //~object()
            printf("%i\n", xxx);
        }
    }
}

But that way, destructor will not always print the last value of "a" (this code is not race free anymore). That's why we also need acquire semantic for atomic_sub (or, strictly speaking, we need an acquire barrier when counter becomes 0 after decrement).

like image 172
wonder.mice Avatar answered Oct 13 '22 00:10

wonder.mice


Looks like he is talking about synchronization of actions on shared object itself, which are not shown on his code blocks (and as the result - confusing).

That's why he put acq_rel - because all actions on the object should happens before its destruction, all in order.

But I'm still not sure why he talks about swapping delete with fetch_sub.

like image 36
qble Avatar answered Oct 13 '22 00:10

qble


This is a late reply.

Let's start out with this simple type:

struct foo
{
    ~foo() { std::cout << value; }
    int value;
};

And we'll use this type in a shared_ptr, as follows:

void runs_in_separate_thread(std::shared_ptr<foo> my_ptr)
{
    my_ptr->value = 5;
    my_ptr.reset();
}

int main()
{
    std::shared_ptr<foo> my_ptr(new foo);
    std::async(std::launch::async, runs_in_separate_thread, my_ptr);
    my_ptr.reset();
}

Two threads will be running in parallel, both sharing ownership of a foo object.

With a correct shared_ptr implementation (that is, one with memory_order_acq_rel), this program has defined behavior. The only value that this program will print is 5.

With an incorrect implementation (using memory_order_relaxed) there are no such guarantees. The behavior is undefined because a data race of foo::value is introduced. The trouble occurs only for cases when the destructor gets called in the main thread. With a relaxed memory order, the write to foo::value in the other thread may not propagate to the destructor in the main thread. A value other than 5 could be printed.

So what's a data race? Well, check out the definition and pay attention to the last bullet point:

When an evaluation of an expression writes to a memory location and another evaluation reads or modifies the same memory location, the expressions are said to conflict. A program that has two conflicting evaluations has a data race unless either

  • both conflicting evaluations are atomic operations (see std::atomic)
  • one of the conflicting evaluations happens-before another (see std::memory_order)

In our program, one thread will write to foo::value and one thread will read from foo::value. These are supposed to be sequential; the write to foo::value should always happen before the read. Intuitively, it makes sense that they would be as the destructor is supposed to be the last thing that happens to an object.

memory_order_relaxed does not offer such ordering guarantees though and so memory_order_acq_rel is required.

like image 22
Pubby Avatar answered Oct 12 '22 22:10

Pubby