I've been looking into implementations of atomic reference counting.
Most of the operations are very consistent between libraries, but I've found a surprising variety in the "decrease refcount" operation. (Note that, generally, the only difference between shared and weak decref is which on_zero()
is called. Exceptions are noted below.)
If there are other implementations implemented in terms of C11/C++11 model (what does MSVC do?), other than the "we use seq_cst because we don't know any better" kind, feel free to edit them in.
Most of the examples were originally C++, but here I've rewritten them to C, inlined and normalized to the >= 1
convention:
#include <stdatomic.h>
#include <stddef.h>
typedef struct RefPtr RefPtr;
struct RefPtr {
_Atomic(size_t) refcount;
};
// calls the destructor and/or calls free
// on a shared_ptr, this also calls decref on the implicit weak_ptr
void on_zero(RefPtr *);
From Boost intrusive_ptr examples and openssl:
void decref_boost_intrusive_docs(RefPtr *p) {
if (atomic_fetch_sub_explicit(&p->refcount, 1, memory_order_release) == 1) {
atomic_thread_fence(memory_order_acquire);
on_zero(p);
}
}
It would be possible to use memory_order_acq_rel for the fetch_sub operation, but this results in unneeded "acquire" operations when the reference counter does not yet reach zero and may impose a performance penalty.
But most others ( Boost, libstdc++, libc++ shared ) do something else:
void decref_common(RefPtr *p) {
if (atomic_fetch_sub_explicit(&p->refcount, 1, memory_order_acq_rel) == 1)
on_zero(p);
}
But libc++ does something different for the weak count. Curiously, this is in an external source file:
void decref_libcxx_weak(RefPtr *p) {
if (atomic_load_explicit(&p->refcount, memory_order_acquire) == 1)
on_zero(p);
else
decref_common(p);
}
The question, then is: what are the actual differences?
Sub-questions: Are the comments wrong? What do specific platforms do (on aarch64, would ldar
be cheaper than dmb ishld
? also ia64?)? Under what conditions can weaker versions be used (e.g. if the dtor is a nop, if the deleter is just free
, ...)?
See also Atomic Reference Counting and Why is an acquire barrier needed before deleting the data in an atomically reference counted smart pointer?
The libc++ choice is documented in the source code:
NOTE: The acquire load here is an optimization of the very common case where a shared pointer is being destructed while having no other contended references.
libc++ coder observed that most of the time, when the last shared_ptr
is destroyed there is no weak_ptr
referencing the shared object. As far as I know, and at least on x86, read-modify-write instructions are much more expansive than a read instructions. So, for the most common case, they decided to avoid to perform an expansive and unusefull read-modify-write. Other implementation of the standard library does not perform this optimization.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With