Lock-free Reference counting and C++ smart pointers

Tags:

In general, most widely known implementations of reference-counting smart ptr classes in C++, including the standard std::shared_ptr, use atomic reference counting, but do not provide atomic access to the same smart ptr instance. In other words, multiple threads may safely operate on separate shared_ptr instances which point to the same shared object, but multiple threads cannot safely read/write instances of the same shared_ptr instance without providing some kind of synchronization such as a mutex or whatever.

An atomic version of a shared_ptr called "atomic_shared_ptr" has been proposed, and preliminary implementations already exist. Presumably, atomic_shared_ptr could easily be implemented with a spin lock or mutex, but a lock-free implementation is also possible.

After studying some of these implementations, one thing is obvious: implementing a lock-free std::shared_ptr is very difficult, and seems to require so many compare_and_exchange operations as to make me question whether a simple spin lock would achieve better performance.

The main reason that it's so difficult to implement a lock-free reference-counted pointer is because of the race that always exists between reading the shared control block (or the shared object itself, if we're talking about an intrusive shared pointer), and modifying the reference count.

In other words, you can't even safely read the reference count because you never know when some other thread has deallocated the memory where the reference count lives.

So in general, various complex strategies are employed to create lock-free versions. The implementation here looks like it uses a double-reference count strategy, where there are "local" references which count the number of threads concurrently accessing the shared_ptr instance, and then "shared" or "global" references which count the number of shared_ptr instances pointing to the shared object.

Given all this complexity, I was really surprised to find a Dr. Dobbs article, from 2004 no less (way before C++11 atomics) that seems to nonchalantly solve this entire problem:

http://www.drdobbs.com/atomic-reference-counting-pointers/184401888

It looks like the author is claiming to somehow be able to:

"... [read] the pointer to the counter, increments the counter, and returns the pointer—all in such a manner that no other threads can cause an incorrect result"

But I don't really understand the way he actually implements this. He's using (non-portable) PowerPC instructions (the LL/SC primitives lwarx and stwcx) to pull this off.

The relevant code that does this is what he calls an "aIandF" (atomic increment and fetch), which he defines as:

addr aIandF(addr r1){   addr tmp;int c;   do{     do{       tmp = *r1;       if(!tmp)break;       c = lwarx(tmp);     }while(tmp != *r1);   }while(tmp && !stwcx(tmp,c+1));   return tmp; };

Apparently, addr is a pointer type pointing to the shared object that owns the reference count variable.

My question(s) is:, is this only possible to do with an architecture that supports LL/SC operations? It seems it would be impossible to do with cmpxchg. And secondly, how exactly does this work? I've read over this code a few times now, and I can't really understand what's going on. I understand what LL/SC primitives do, I just can't make any sense of the code.

The best I can understand is that addr r1 is the address of the pointer to the shared object, and also the address of the pointer to the reference count (which I guess means the reference count variable needs to be the first member of the struct that defines the shared object). He then dereferences addr (getting the actual address of the shared object). Then, he linked loads the value stored at the address in tmp, and stores the result in c. This is the counter value. He then conditionally stores that value incremented (which will fail if tmp has changed) back into tmp.

What I don't understand is how this works. The address of the shared object may never change and the LL/SC could succeed - but how does this help us if another thread has deallocated the shared object in the mean time?

725

asked May 17 '16 21:05

Siler

1 Answers

addr aIandF(addr r1) {   addr tmp;   int c;   do {     do {       // r1 holds the address of the address       // of the refcount       tmp = *r1;       // grab the address of the refcount       if (!tmp) break; // if it's null, bail        // read current refcount       // and acquire reservation       c = lwarx(tmp);        // now we hold the reservation,       // check to see if another thread       // has changed the shared block address     } while (tmp != *r1); // if so, start over      // if the store succeeds we know we held     // the reservation throughout   } while (tmp && !stwcx(tmp, c+1));   return tmp; };

Note that aIandF is used specifically when constructing a copy of an existing shared pointer, claiming a reference for the copy.

The Dr Dobbs article describes the operation for releasing a reference as first atomically swapping the address of the shared counter in the source shared pointer object with a null pointer local to the function; then atomically decrementing the counter; then testing to see if the result of the decrement was zero. This order of operations is important: you say, "The address of the shared object may never change and the LL/SC could succeed - but how does this help us if another thread has deallocated the shared object in the mean time?" - but this can never happen, since the object will never be deallocated without the swap happening first, giving us a means to observe the change of address.

aIandF tests for the counter address being null on entry.

It can spot the address becoming null if that occurs before the lwarx, because it explicitly tests for this once it has the reservation.

If the swap in the decrementing thread occurs after the lwarx we do not actually care: if the stwcx in aIandF succeeds, we know the decrementing thread will see the new reference count and not destruct the object, and we can proceed knowing we have claimed a reference to it; whereas if the other thread succeeds in decrementing the counter first, we will lose our reservation, the store will fail and we'll detect the destruction of the object on the next loop iteration.

This algorithm assumes a strongly consistent memory model (all threads always see effects of each other's reads and writes in program order) - this is not necessarily the case even on those modern architectures that do support ll/sc.

EDIT: thinking about it, the algorithm also apparently makes the assumption that it is always safe to read from a memory address that was once valid (e.g., no MMU/protection; or, the algorithm is broken):

if (!tmp) break;  // another thread could, at this point, do its swap,  // decrement *and* destroy the object tmp points to // before we get to do anything else  c = lwarx(tmp);   // if that happened, we'll detect this fact and do nothing with c // but ONLY if the lwarx doesn't trap  // (due to the memory tmp points to  // getting unmapped when the other thread frees the object)

151

answered Oct 04 '22 06:10

moonshadow

Related questions
                            
                                C++ Member Initialization List
                            
                                How to use memset while handling strings in C++? [closed]
                            
                                How to identify the file content as ASCII or binary
                            
                                Cannot open include file with Visual Studio
                            
                                Optimizing very often used anagram function
                            
                                How do I generate a random number between two variables that I have stored? [duplicate]
                            
                                Fill a vector with random numbers c++
                            
                                Does java really have pointers or not? [closed]
                            
                                How to execute a piece of code only once?
                            
                                Objects that can be initialized but not assigned
                            
                                What are some tricks I can use with macros? [closed]
                            
                                C++ const keyword - use liberally?
                            
                                Adding the ! operator and sqrt(), pow(), etc. to a calculator example application
                            
                                What does it mean for an object to exist in C++?
                            
                                Is std::swap(x, x) guaranteed to leave x unchanged?
                            
                                How to register C++ React Native module in Android
                            
                                Why does this code compile with gcc but not with clang
                            
                                Why is DFS slower in one tree and faster in the other?
                            
                                Why is my hand-tuned, SSE-enabled code so slow?
                            
                                Must template argument functions be treated as potentially constexpr?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Lock-free Reference counting and C++ smart pointers

Tags:

c++

c++11

smart-pointers

atomic

Siler

People also ask

1 Answers

moonshadow

Recent Activity

Donate For Us