Does any C++
library implements something like Haskell
Data.Sequence
container?
I'm mostly interested in:
O(logn)
access by an index. Aka operator[](size_type pos)
.O(logn)
insertion/deletion in the middle (by an index).It seems to me that the way to implement* such a data structure you'd need a tree that stores the number of elements in each node. It allows insertion and retrieval in O(log(N)), and indices are maintained simply by counting how many elements there are to the "left" of a given node in the tree.
* I'm answering maybe a slightly different question here, the actual question asks for a recommendation for a library, which is explicitly off-topic at SO.
A node of this tree would look like this:
template<typename T>
struct Node {
Node* left;
Node* right;
size_t elements;
T value
T& access(size_t index) {
if (left->elements == index) {
return value;
} else if (left->elements > index) {
return left->access(index);
} else {
return right->access(index - left->elements - 1);
}
}
void insert(size_t index, T&& value) {
// insert `value` at right place, increment `elements`
}
}
(I left the insert
method as an exercise to the reader.)
Edit: As willir (OP) mentioned in the comments below, the tree would need to be a balanced tree. Arne Vogel suggests that B-trees would be the best choice for cache locality.
If you do implement something like this, make sure that you measure your application, compare it to std::vector
. Inserting into a vector at an arbitrary location is O(N), not O(log(N)), but it's O(N) of a very cheap operation. A vector has many advantages over such a data structure:
No code to maintain.
Less stuff to store (in a tree you need to store two pointers and a count, which are not necessary in a vector), meaning more of your data fits in the cache at the same time.
Data is always accessed in the same order in which it is stored. With trees you need to traverse nodes that can be stored in arbitrary locations in memory, they don't need to be close together, and are likely not stored in the order in which they're read.
Points 2 and 3 mean that vectors have many fewer cache misses. This can make for a huge difference in timings.
Moving data around in a vector becomes rather slow if each data element is large. But in this case you should keep pointers to your data in the vector, so that you're moving lists of pointers around, rather than your actual data. For such large data elements, I would suggest allocating each one independently, and keeping its pointer in a std::vector<std::unique_ptr<T>>
.
Here are some relevant links:
DZone: C++ benchmark – std::vector VS std::list
YouTube: Day 1 Keynote - Bjarne Stroustrup: C++11 Style
SO: Relative performance of std::vector vs. std::list vs. std::slist?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With