Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement an intrusive linked list that avoids undefined behavior?

Tags:

c++

c++11

For the third time in a few years I find myself needing an intrusive linked list for a project that does not allow boost (ask management...).

For the third time I find that the intrusive linked list implementation I have works perfectly, but that I really don't like that it makes use of undefined behavior - namely when converting the pointer to a list node into a pointer to the object containing that list node.

That awful code currently looks like this:

struct IntrusiveListNode {
    IntrusiveListNode * next_;
    IntrusiveListNode * prev_;
};

template <typename T, IntrusiveListNode T::*member>
class IntrusiveList {
// snip ...
private:
    T & nodeToItem_(IntrusiveListNode & node) {
        return *(T*)(((char*)&node)-((size_t)&(((T*)nullptr)->*member)));
    }

    IntrusiveListNode root_;
};

I don't really care how ugly nodeToItem_ gets, but I would like to keep the public interface and syntax of IntrusiveList the same. Specifically, I would like to specify the type of a list type using IntrusiveList<Test, &Test::node_> rather than IntrusiveList<Test, offsetof(Test, node_)>.

It's almost 2016 - is there any way to do this without invoking undefined behavior?


Edit: There have been a few suggested solutions (involving different structures of the list) in the comments which I want to summarize here:

  1. Live with undefined behavior, since the language has seemingly arbitrary limitations that prevent using member pointers in reverse.

  2. Store an additional pointer to the containing class within IntrusiveListNode. This is currently probably the cleanest solution (no change in interface necessary), but does require a third pointer in every list node (small optimizations may be possible).

  3. Derive from IntrusiveListNode and use static_cast. In boost this is the base_hook version of an intrusive linked list. I would like to stick with the member_hook version to avoid introducing multiple inheritance.

  4. Store pointers to the next and previous containing class instead of to the next and previous list node within IntrusiveListNode. This makes creating the root node within the intrusive list difficult. Either the list must include a full instantiation of T (which is not possible e.g. if T is abstract), or the end of the list would need to be a null pointer (which would break --list.end(), allowing forward iteration only).

  5. Boost intrusive lists have a member_hook version that works somehow, but the implementation has not been understood (and it possibly also relies on undefined behavior).

The question remains: is it possible to make an intrusive member-based list with bidirectional iteration support, no undefined behavior, and no "unnecessary" memory overhead?

like image 761
zennehoy Avatar asked Dec 07 '15 13:12

zennehoy


People also ask

What is an intrusive linked list?

Intrusive linked lists are a variation of linked lists where the links are embedded in the structure that’s being linked. In a typical linked list implementation, a list node contains a data pointer to the linked data and a next pointer to the next node in the list.

Why do we use intrusive list in C++?

There are two main reasons to use intrusive lists over non-intrusive linked lists: Fewer memory allocations. Less cache thrashing. With non-intrusive linked lists, creating a new object and adding it to a list requires two memory allocations: one for the object, and one for the list node.

What is a list node in a linked list?

In a typical linked list implementation, a list node contains a data pointer to the linked data and a next pointer to the next node in the list. In an intrusive linked list implementation, the list node contains next pointer to the next list node, but no data pointer because the list is embedded in the linked object itself.

What is doubly linked list in C++?

A doubly linked list is a linked list that keeps pointers to both the next node and the previous node. The list structure would contain an extra prev pointer: Doubly linked lists make deletion and insertion easier, because you only need a reference to a single node to perform the deletion or insertion.


1 Answers

I would side-step the problem and use a node<T> containing suitable members to link the range. To deal with a bidirectional, intrusive list I'd use an asymmetric node<T> like this:

template <typename T>
class intrusive::node
{
    template <typename S, node<S> S::*> friend class intrusive::list;
    template <typename S, node<S> S::*> friend class intrusive::iterator;

    T*       next;
    node<T>* prev;
public:
    node(): next(), prev() {}
    node(node const&) {}
    void operator=(node const&) {}
};

The basic idea is that the list<T, L> contains a node<T> using the next pointer to point to the first element. That's fairly straight forward: given a pointer p to a T the link to the next node can be traversed using (p->*L).next. However, instead of directly navigating the list using T*, an iterator<T, L> actually uses a pointer to the node<T>: while that isn't necessary for forward traversal, it enables backward traversal and insertions anywhere in the list without special treatment of the list head.

The copy constructor and copy assignment are defined to do nothing to avoid half-inserted nodes when copying a node. Depending on the needs of the nodes it may be more reasonable to rather = delete these operations. However, that's unrelated to the question at hand.

The iterator just uses a pointer to the node<T> whose next member points at the current node. For the first element in the list this is a pointer to the list<T, L>'s node<T> member. Assuming you got a pointer to a suitable node<T> an iterator<T, L> can be created from that:

template <typename T, intrusive::node<T> T::*Link>
class intrusive::iterator
{
    template <typename S, node<S> S::*> friend class intrusive::list;
    node<T>* current;

public:
    explicit iterator(node<T>* current): current(current) {}
    T& operator*() { return *this->operator->(); }
    T* operator->() { return this->current->next; }
    bool operator== (iterator const& other) const {
        return this->current == other.current;
    }
    bool operator!= (iterator const& other) const {
        return !(*this == other);
    }
    iterator& operator++() {
        this->current = &(this->current->next->*Link);
        return *this;
    }
    iterator operator++(int) {
        iterator rc(*this);
        this->operator++();
        return rc;
    }
    iterator& operator--() {
        this->current = this->current->prev;
        return *this;
    }
    iterator operator--(int) {
        iterator rc(*this);
        this->operator--();
        return rc;
    }
};

Dereferencing just uses the next pointer. The same is true for forward iteration which uses the next pointer together with the member pointer to get hold of the address of the next node<T>. Since the iterator's prev already pointesr at a node<T> backward iteration just needs to replace the current node<T> with the prev element.

Finally, this leaves a list maintaining the beginning and the end of the list. Dealing with the bidirectional access and the corresponding access to the last node adds a bit of complexity and the need to actually have a dedicated node. Here is an implementation (which isn't thoroughly tested: I may have messed up some of the links):

template <typename T, intrusive::node<T> T::*Link>
class intrusive::list
{
    node<T> content;

public:
    list() { this->content.prev = &this->content; }
    iterator<T, Link> begin() { return iterator<T, Link>(&this->content); }
    iterator<T, Link> end() { return iterator<T, Link>(this->content.prev); }

    T& front() { return *this->content.next; }
    T& back() { return *(this->content.prev->prev->next); }
    bool empty() const { return &this->content == this->content.prev; }
    void push_back(T& node) { this->insert(this->end(), node); }
    void push_front(T& node) { this->insert(this->begin(), node); }
    void insert(iterator<T, Link> pos, T& node) {
        (node.*Link).next = pos.current->next;
        ((node.*Link).next
         ? (pos.current->next->*Link).prev 
         : this->content.prev) = &(node.*Link);
        (node.*Link).prev = pos.current;
        pos.current->next = &node;
    }
    iterator<T, Link> erase(iterator<T, Link> it) {
        it.current->next = (it.current->next->*Link).next;
        (it.current->next
         ? (it.current->next->*Link).prev
         : this->content.prev) = it.current;
        return iterator<T, Link>(&(it.current->next->*Link));
    }
};

Just for a bit of sanity: here is a function to simply print the list:

template <typename T, intrusive::node<T> T::*Link>
std::ostream& intrusive::operator<< (std::ostream& out, intrusive::list<T, Link>& list)
{
    out << "[";
    if (!list.empty()) {
        std::copy(list.begin(), --list.end(), std::ostream_iterator<T>(out, ", "));
        out << list.back();
    }
    return out << "]";
}

There are few other approaches avoiding the need to do any funky access to the enclosing class. The above avoids a couple of conditions. Assuming I managed to set the appropriate links correct the code would not rely on any implementation defined or undefined behavior.

You'd use the list like this:

class Node {
public:
    intrusive::node<Node> link0;
    intrusive::node<Node> link1;
    int                   n;
    Node(int n): n(n) {}
};
std::ostream& operator<< (std::ostream& out, Node const& n) {
    return out << n.n;
}

int main()
{
    intrusive::list<Node, &Node::link0> l0;
    intrusive::list<Node, &Node::link1> l1;

    Node n[] = { 10, 11, 12, 13, 14, 15 };

    l0.push_front(n[0]);
    l0.push_front(n[1]);
    l0.push_front(n[2]);

    l1.push_back(n[0]);
    l1.push_back(n[1]);
    l1.push_back(n[2]);

    std::cout << "l0=" << l0 << " l1=" << l1 << "\n";
}
like image 71
Dietmar Kühl Avatar answered Oct 19 '22 07:10

Dietmar Kühl