Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Atomic shared_ptr for lock-free singly linked list

I'm wondering if it is possible to create a lock-free, thread-safe shared pointer for any of the "common" architectures, like x64 or ARMv7 / ARMv8.

In a talk about lock-free programming at cppcon2014, Herb Sutter presented a (partial) implementation of a lock-free singly linked list. The implementation looks quite simple, but it relies on an atomic shared_ptr implementation that doesn't exist in the standard library yet or on using the specialized std::atomic... functions. This is especially important as single push/pop calls potentially invoke multiple atomic loads/stores and compare_exchange operations.

The problem I see (and I think some of the questions in the talk went into the same direction) is that for this to be an actual lock-free data structure, those atomic operations would have to be lock-free themselves. I don't know of any standard library implementation for the std::atomic... functions that is lock-free and - at least with a short google / SO search - I also didn't find a suggestion of how to implement a lock-free specialization for std::atomic<std::shared_ptr>.

Now before I'm wasting my time on this I wanted to ask:

  • Do you know, if it is possible to write an lockfree, atomic shared pointer at all?
  • Are there already any implementations that I've overlooked and - ideally - are even compatible with what you would expect from a std::atomic<std::shared_ptr>? For the mentioned queue it would especially require a CAS-operation.
  • If there is no way to implement this on current architectures, do you see any other benefit in Herb's implementation compared to a "normal" linked list that is protected by a lock?

For reference, here is the code from Herb Sutter (might contain typos from me):

template<class T> 
class slist {
    struct Node { T t;  std::shared_ptr<Node> next; };
    std::atomic<std::shared_ptr<Node>> head;        
public:
    class reference{
        std::shared_ptr<Node> p;
    public:
        reference(std::shared_ptr<Node> p_){}
        T& operator*(){ return p->t; }
        T* operator->(){ return &p->t; }
    };
    auto find(T t) const {
        auto p = head.load();
        while (p && p-> != t) {
            p = p - next;
        }
        return reference(move(p));
    }
    void push_front(T t) {
        auto p = std::make_shared<Node>();
        p->t = t;
        p->next = head;
        while (!head.compare_exchange_weak(p->next, p)) {}
    }
    void pop_front() {
        auto p = head.load();
        while (p && !head.compare_exchange_weak(p, p - next)) { ; }
    }
};

Note, that in this implementation, single instances of a shared_ptr can be accessed / modified by multiple different threads. It can be read/copied, reset and even deleted (as part of a node). So this not about whether multiple different shared_ptr objects (that manage the same object) can be used by multiple threads without a race condition - this that is already true for current implementations and required by the standard - but it is about concurrent access to a single pointer instance, which is - for standard shared pointers - no more threadsafe than the same operations on raw pointers would be.


To explain my motivation:
This is mainly an academic question. I've no intention of implementing my own lock free list in production code, but I find the topic interesting and at first glance, Herb's presentation seemed to be a good introduction. However, while thinking about this question and @sehe's comment on my answer, I remembered this talk, had another look at it and realized that it doesn't make much sense to call Herb's implementation lock-free, if it's primitive operations require locks (which they currently do). So I was wondering, whether this is just a limitation of the current implementations or a fundamental flaw in the design.

like image 355
MikeMB Avatar asked Jul 07 '15 18:07

MikeMB


People also ask

Is Atomic shared_ptr lock-free?

atomic<shared_ptr<T>>::is_lock_freeReturns true if the atomic operations on all objects of this type are lock-free, false otherwise.

What does shared_ptr get () do?

A shared_ptr may share ownership of an object while storing a pointer to another object. get() returns the stored pointer, not the managed pointer.

Is shared_ptr Atomic?

Consistency: The atomic operations for the std::shared_ptr are the only atomic operations for a non-atomic data type.

Do I need to delete a shared_ptr?

If you've allocated a shared_ptr dynamically then you're certainly allowed to delete it whenever you want. But if you're asking whether you're allowed to delete whatever object is being managed by the shared_ptr , then the answer is ... it depends.


2 Answers

I'm adding this as an answer since it's too long to fit in a comment:

Something to consider. A lock-free shared_ptr is not needed to implement lock-free/wait-free data structures.

The reason Sutter uses shared_ptr in his presentation is because the most complicated part of writing lock-free data structures is not the synchronization, but the memory reclamation: we cannot delete nodes while they're potentially accessed by other threads, so we have to leak them and reclaim later. A lock-free shared_ptr implementation essentially provides "free" memory reclamation and makes examples of lock-free code palatable, especially in a context of a time-limited presentation.

Of course, having a lock-free atomic_shared_ptr as part of the Standard would be a huge help. But it's not the only way of doing memory reclamation for lock-free data structures, there's the naive implementation of maintaining a list of nodes to be deleted at quiescent points in execution (works in low-contention scenarios only), hazard pointers, roll-your-own atomic reference counting using split counts.

As for performance, @mksteve is correct, lock-free code in not guaranteed to outperform lock-based alternatives unless maybe it runs on a highly parallel system offering true concurrency. It's goal is to enable maximum concurrency and because of that what we typically get is threads doing less waiting at the the cost of performing more work.

PS If this is something that interests you, you should consider taking a look at C++ Concurrency in Action by Anthony Williams. It dedicates a whole chapter to writing lock-free/wait-free code, which offers a good starting place, walking through implementations of lock-free stack and queue.

like image 74
Alla Avatar answered Oct 15 '22 00:10

Alla


  • Do you know, if it is possible to write an lockfree, atomic shared pointer at all?

  • Are there already any implementations that I've overlooked and - ideally - are even compatible with what you would expect from a std::atomic?

I think the std::atomic_... offers a form of implementation, where the slist would perform special atomic_ queries on the shared_ptr. The problem with this being separated into two classes (std::atomic and std::shared_ptr) is that they each have constraints which need to be adhered to in order to function. The class separation, makes that knowledge of shared constaints impossible.

Within slist, which knows both items, it can help the situation, and thus probably the atomic_... functions will work.

  • If there is no way to implement this on current architectures, do you see any other benefit in Herb's implementation compared to a "normal" linked list that is protected by a lock?

From Wikipedia : Non blocking algorithm the purpose of the lock free nature, is to guarantee some progress is being made by at least one thread.

This does not give a guarantee of better performance than a locked implementation, but does give a guarantee that deadlocks will not occur.

Imagine T required a lock to perform a copy, this could also have been owned by some operations outside of the list. Then a deadlock would be possible, if it was owned, and a lock based implementation of slist was called.

I think CAS is implemented in the std::compare_exchange_weak, so would be implementation independent.

Current lock free algorithms for complex structures (e.g vector, map) tend to be significantly less efficient than locking algorithms, Dr Dobbs : lock-free data structures but the benefit offered (improved thread performance) would improve significantly the performance of computers, which tend to have large numbers of idle cpus.

Further research into the algorithms may identify new instructions which could be implemented in the CPUs of the future, to give us wait-free performance and improved utilization of computing resources.

like image 40
mksteve Avatar answered Oct 15 '22 02:10

mksteve