Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between atomic decref implementations

I've been looking into implementations of atomic reference counting.

Most of the operations are very consistent between libraries, but I've found a surprising variety in the "decrease refcount" operation. (Note that, generally, the only difference between shared and weak decref is which on_zero() is called. Exceptions are noted below.)

If there are other implementations implemented in terms of C11/C++11 model (what does MSVC do?), other than the "we use seq_cst because we don't know any better" kind, feel free to edit them in.

Most of the examples were originally C++, but here I've rewritten them to C, inlined and normalized to the >= 1 convention:

#include <stdatomic.h>
#include <stddef.h>
typedef struct RefPtr RefPtr;
struct RefPtr {
    _Atomic(size_t) refcount;
};
// calls the destructor and/or calls free
// on a shared_ptr, this also calls decref on the implicit weak_ptr
void on_zero(RefPtr *);

From Boost intrusive_ptr examples and openssl:

void decref_boost_intrusive_docs(RefPtr *p) {
    if (atomic_fetch_sub_explicit(&p->refcount, 1, memory_order_release) == 1) {
        atomic_thread_fence(memory_order_acquire);
        on_zero(p);
    }
}

It would be possible to use memory_order_acq_rel for the fetch_sub operation, but this results in unneeded "acquire" operations when the reference counter does not yet reach zero and may impose a performance penalty.

But most others ( Boost, libstdc++, libc++ shared ) do something else:

void decref_common(RefPtr *p) {
    if (atomic_fetch_sub_explicit(&p->refcount, 1, memory_order_acq_rel) == 1)
        on_zero(p);
}

But libc++ does something different for the weak count. Curiously, this is in an external source file:

void decref_libcxx_weak(RefPtr *p) {
    if (atomic_load_explicit(&p->refcount, memory_order_acquire) == 1)
        on_zero(p);
    else
        decref_common(p);
}

The question, then is: what are the actual differences?

Sub-questions: Are the comments wrong? What do specific platforms do (on aarch64, would ldar be cheaper than dmb ishld? also ia64?)? Under what conditions can weaker versions be used (e.g. if the dtor is a nop, if the deleter is just free, ...)?

See also Atomic Reference Counting and Why is an acquire barrier needed before deleting the data in an atomically reference counted smart pointer?

like image 335
o11c Avatar asked Nov 06 '22 17:11

o11c


1 Answers

The libc++ choice is documented in the source code:

NOTE: The acquire load here is an optimization of the very common case where a shared pointer is being destructed while having no other contended references.

libc++ coder observed that most of the time, when the last shared_ptr is destroyed there is no weak_ptr referencing the shared object. As far as I know, and at least on x86, read-modify-write instructions are much more expansive than a read instructions. So, for the most common case, they decided to avoid to perform an expansive and unusefull read-modify-write. Other implementation of the standard library does not perform this optimization.

like image 113
Oliv Avatar answered Nov 15 '22 05:11

Oliv