Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient linked list in C++?

This document says std::list is inefficient:

std::list is an extremely inefficient class that is rarely useful. It performs a heap allocation for every element inserted into it, thus having an extremely high constant factor, particularly for small data types.

Comment: that is to my surprise. std::list is a doubly linked list, so despite its inefficiency in element construction, it supports insert/delete in O(1) time complexity, but this feature is completely ignored in this quoted paragraph.

My question: Say I need a sequential container for small-sized homogeneous elements, and this container should support element insert/delete in O(1) complexity and does not need random access (though supporting random access is nice, it is not a must here). I also don't want the high constant factor introduced by heap allocation for each element's construction, at least when the number of element is small. Lastly, iterators should be invalidated only when the corresponding element is deleted. Apparently I need a custom container class, which might (or might not) be a variant of doubly linked list. How should I design this container?

If the aforementioned specification cannot be achieved, then perhaps I should have a custom memory allocator, say, bump pointer allocator? I know std::list takes an allocator as its second template argument.

Edit: I know I shouldn't be too concerned with this issue, from an engineering standpoint - fast enough is good enough. It is just a hypothetical question so I don't have a more detailed use case. Feel free to relax some of the requirements!

Edit2: I understand two algorithms of O(1) complexity can have entirely different performance due to the difference in their constant factors.

like image 493
Leedehai Avatar asked Aug 16 '17 15:08

Leedehai


People also ask

Which is most efficient linked list?

Doubly linked list is the best solution here. We maintain head and tail pointers, since inserted item is always greatest, we insert at tail.

What is a memory efficient linked list?

Explanation: Memory efficient doubly linked list has only one pointer to traverse the list back and forth. The implementation is based on pointer difference. It uses bitwise XOR operator to store the front and rear pointer addresses.

How efficient are linked lists?

Advantages: LinkedList is that insertions and deletion can be done very quickly. If you just want to insert an element right to the beginning of the LinkedList, that can be done in constant time O(1). If you want to delete an element at the beginning of a LinkedList, again constant time O(1).

Are linked lists more efficient?

From a memory allocation point of view, linked lists are more efficient than arrays. Unlike arrays, the size for a linked list is not pre-defined, allowing the linked list to increase or decrease in size as the program runs.


3 Answers

Your requirements are exactly those of std::list, except that you've decided you don't like the overhead of node-based allocation.

The sane approach is to start at the top and only do as much as you really need:

  1. Just use std::list.

    Benchmark it: is the default allocator really too slow for your purposes?

    • No: you're done.

    • Yes: goto 2

  2. Use std::list with an existing custom allocator such as the Boost pool allocator

    Benchmark it: is the Boost pool allocator really too slow for your purposes?

    • No: you're done.

    • Yes: goto 3

  3. Use std::list with a hand-rolled custom allocator finely tuned to your unique needs, based on all the profiling you did in steps 1 and 2

    Benchmark as before etc. etc.

  4. Consider doing something more exotic as a last resort.

    If you get to this stage, you should have a really well-specified SO question, with lots of detail about exactly what you need (eg. "I need to squeeze n nodes into a cacheline" rather than "this doc said this thing is slow and that sounds bad").


PS. The above makes two assumptions, but both are worth investigation:

  1. as Baum mit Augen points out, it's not sufficient to do simple end-to-end timing, because you need to be sure where your time is going. It could be the allocator itself, or cache misses due to the memory layout, or something else. If something's slow, you still need to be sure why before you know what ought to change.
  2. your requirements are taken as a given, but finding ways to weaken requirements is often the easiest way to make something faster.

    • do you really need constant-time insertion and deletion everywhere, or only at the front, or the back, or both but not in the middle?
    • do you really need those iterator invalidation constraints, or can they be relaxed?
    • are there access patterns you can exploit? If you're frequently removing an element from the front and then replacing it with a new one, could you just update it in-place?
like image 185
Useless Avatar answered Sep 26 '22 00:09

Useless


As an alternative, you can use a growable array and handle the links explicitly, as indexes into the array.

Unused array elements are put in a linked list using one of the links. When an element is deleted, it is returned to the free list. When the free list is exhausted, grow the array and use the next element.

For the new free elements, you have two options:

  • append them to the free list at once,
  • append them on demand, based on the number of elements in the free list vs. the array size.
like image 18
Yves Daoust Avatar answered Sep 22 '22 00:09

Yves Daoust


The requirement of not invalidating iterators except the one on a node being deleted is forbidding to every container that doesn't allocate individual nodes and is much different from e.g. list or map.
However, I've found that in almost every case when I thought that this was necessary, it turned out with a little discipline I could just as well do without. You might want to verify if you can, you would benefit greatly.

While std::list is indeed the "correct" thing if you need something like a list (for CS class, mostly), the statement that it is almost always the wrong choice is, unluckily, exactly right. While the O(1) claim is entirely true, it's nevertheless abysmal in relation to how actual computer hardware works, which gives it a huge constant factor. Note that not only are the objects that you iterate randomly placed, but the nodes that you maintain are, too (yes, you can somehow work around that with an allocator, but that is not the point). On the average, you have two one guaranteed cache misses for anything you do, plus up to two one dynamic allocations for mutating operations (one for the object, and another one for the node).

Edit: As pointed out by @ratchetfreak below, implementations of std::list commonly collapse the object and node allocation into one memory block as an optimization (akin to what e.g. make_shared does), which makes the average case somewhat less catastrophic (one allocation per mutation and one guaranteed cache miss instead of two).
A new, different consideration in this case might be that doing so may not be entirely trouble-free either. Postfixing the object with two pointers means reversing the direction while dereference which may interfere with auto prefetch.
Prefixing the object with the pointers, on the other hand, means you push the object back by two pointers' size, which will mean as much as 16 bytes on a 64-bit system (that might split a mid-sized object over cache line boundaries every time). Also, there's to consider that std::list cannot afford to break e.g. SSE code solely because it adds a clandestine offset as special surprise (so for example the xor-trick would likely not be applicable for reducing the two-pointer footprint). There would likely have to be some amount of "safe" padding to make sure objects added to a list still work the way they should.
I am unable to tell whether these are actual performance problems or merely distrust and fear from my side, but I believe it's fair to say that there may be more snakes hiding in the grass than one expects.

It's not for no reason that high-profile C++ experts (Stroustrup, notably) recommend using std::vector unless you have a really good reason not to.

Like many people before, I've tried to be smart about using (or inventing) something better than std::vector for one or the other particular, specialized problem where it seems you can do better, but it turns out that simply using std::vector is still almost always the best, or second best option (if std::vector happens to be not-the-best, std::deque is usually what you need instead).
You have way fewer allocations than with any other approach, way less memory fragmentation, way fewer indirections, and a much more favorable memory access pattern. And guess what, it's readily available and just works.
The fact that every now and then inserts require a copy of all elements is (usually) a total non-issue. You think it is, but it's not. It happens rarely and it is a copy of a linear block of memory, which is exactly what processors are good at (as opposed to many double-indirections and random jumps over memory).

If the requirement not to invalidate iterators is really an absolute must, you could for example pair a std::vector of objects with a dynamic bitset or, for lack of something better, a std::vector<bool>. Then use reserve() appropriately so reallocations do not happen. When deleting an element, do not remove it but only mark it as deleted in the bitmap (call the destructor by hand). At appropriate times, when you know that it's OK to invalidate iterators, call a "vacuum cleaner" function that compacts both the bit vector and the object vector. There, all unforeseeable iterator invalidations gone.

Yes, that requires maintaining one extra "element was deleted" bit, which is annoying. But a std::list must maintain two pointers as well, in additon to the actual object, and it must do allocations. With the vector (or two vectors), access is still very efficient, as it happens in a cache-friendly way. Iterating, even when checking for deleted nodes, still means you move linearly or almost-linearly over memory.

like image 18
Damon Avatar answered Sep 26 '22 00:09

Damon