Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there anything like Haskell Data.Sequence in C++?

Does any C++ library implements something like Haskell Data.Sequence container?

I'm mostly interested in:

  1. Maintaining the elements order (In which they were inserted).
  2. O(logn) access by an index. Aka operator[](size_type pos).
  3. O(logn) insertion/deletion in the middle (by an index).
like image 681
willir Avatar asked Jan 07 '18 17:01

willir


1 Answers

It seems to me that the way to implement* such a data structure you'd need a tree that stores the number of elements in each node. It allows insertion and retrieval in O(log(N)), and indices are maintained simply by counting how many elements there are to the "left" of a given node in the tree.

* I'm answering maybe a slightly different question here, the actual question asks for a recommendation for a library, which is explicitly off-topic at SO.

A node of this tree would look like this:

template<typename T>
struct Node {
  Node* left;
  Node* right;
  size_t elements;
  T value
  
  T& access(size_t index) {
    if (left->elements == index) {
      return value;
    } else if (left->elements > index) {
      return left->access(index);
    } else {
      return right->access(index - left->elements - 1);
    }
  }

  void insert(size_t index, T&& value) {
    // insert `value` at right place, increment `elements`
  }
}

(I left the insert method as an exercise to the reader.)

Edit: As willir (OP) mentioned in the comments below, the tree would need to be a balanced tree. Arne Vogel suggests that B-trees would be the best choice for cache locality.


But:

If you do implement something like this, make sure that you measure your application, compare it to std::vector. Inserting into a vector at an arbitrary location is O(N), not O(log(N)), but it's O(N) of a very cheap operation. A vector has many advantages over such a data structure:

  1. No code to maintain.

  2. Less stuff to store (in a tree you need to store two pointers and a count, which are not necessary in a vector), meaning more of your data fits in the cache at the same time.

  3. Data is always accessed in the same order in which it is stored. With trees you need to traverse nodes that can be stored in arbitrary locations in memory, they don't need to be close together, and are likely not stored in the order in which they're read.

Points 2 and 3 mean that vectors have many fewer cache misses. This can make for a huge difference in timings.

Moving data around in a vector becomes rather slow if each data element is large. But in this case you should keep pointers to your data in the vector, so that you're moving lists of pointers around, rather than your actual data. For such large data elements, I would suggest allocating each one independently, and keeping its pointer in a std::vector<std::unique_ptr<T>>.

Here are some relevant links:

  • DZone: C++ benchmark – std::vector VS std::list

  • YouTube: Day 1 Keynote - Bjarne Stroustrup: C++11 Style

  • SO: Relative performance of std::vector vs. std::list vs. std::slist?

like image 163
Cris Luengo Avatar answered Nov 15 '22 01:11

Cris Luengo