Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there versions of the C++ STL's associative data structures optimized for numerous partial copies?

I have a large tree that grows as my algorithm progresses. Each node contains set, which I suppose is implemented as balanced binary search tree. Each node's set shall remain fixed after that node's creation, before its use in creating that node's children.

I fear however that copying each set is prohibitively expensive. Instead, I would prefer that each newly created node's set utilize all appropriate portions of the parent node's set. In short, I'm happy copying O(log n) of the set but not O(n).

Are there any variants of the STL's associative data structures that offer such an partial copy optimization? Perhaps in Boost? Such a data structure would be trivial to implement in Haskell or OCaML of course, but it'd require more effort in C++.

like image 714
Jeff Burdges Avatar asked Oct 04 '11 16:10

Jeff Burdges


1 Answers

I know it's not generally productive to suggest a different language, but Haskell's standard container libraries do exactly this. I remember seeing a video (was it Simon Peyton Jones?) talking about this exact problem, and how a Haskell solution ended up being much faster than a C++ solution for the given programmer effort. Of course, this was for a specific problem that had a lot of sets with a lot of shared elements.

There is a fair amount of research into this subject. If you are looking for keywords, I suggest searching for "functional data structures" instead of "immutable data structures", since most functional paradigms benefit from immutability in general. Structures such as finger tree were developed to solve exactly this problem.

I know of no C++ library that implements these data structures. There is nothing stopping you from reading the relevant papers (or the Haskell source code, which is about 1k lines for Data.Set including tests) and implementing it yourself, but I know that is not what you'd want to hear. You'd also need do some kind of reference counting for the shared nodes, which for such deep structures can have a higher overhead than even simple garbage collectors.

like image 113
Dietrich Epp Avatar answered Nov 17 '22 07:11

Dietrich Epp