Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Purely functional data structures with copy-on-write?

I want to have the advantage of functional data structures (multiple versions of data that can share structure) but be able to modify it in an imperative style.

What I'm thinking about (and a possible use): a RPG game in which whole game history is stored (for example, to allow for travelling back in time). Using copy-on-write, I could simply clone the structure holding game state and modify it by introducing a new turn - but have access to the earlier turns (not necessarily all of them, maybe just selected snapshots of game state), without the penalty of having to copy everything every time.


Let's say foo is a map.

bar = foo.clone()

Nothing of foo's structure (for example, tree) gets copied yet. However, from now on bar is treated as a copy and no changes are allowed to propagate back to `foo'.

baz = bar[someKey]
baz.modifyInSomeWay()

Now

  • a new object gets created, that is a modified copy of baz.
  • bar is replaced with a new map, holding new baz (possibly retaining some of foo's structure).
  • foo is unaffected.

But if we then do...

baz.modifyAgain()

...baz can be just modified, because we have a recent version of it. bar doesn't need to be changed.

All this requires holding some version information about foo and bar, increasing it on foo.clone(), and passing it to baz somehow (by making it a proxy object?).

Also, any part of structure that has been cloned becomes a 'part of history' and can't be changed anymore, which could be enforced at run-time.


This resembles JavaScript's prototypes a bit, but I it's more as it allows for changes to propagate upwards. I think it would be something like a version control system.

  • Has this been done, and to what extent?
  • Is this a good idea? If not, is there a way to save it?
  • How could it be implemented? I was thinking about building it on top of some high-level GC-ed language, like Python.
like image 621
hmp Avatar asked Sep 04 '09 21:09

hmp


1 Answers

Functional ("persistent") data structures are typically recursively built up out of immutable nodes (e.g. singly linked list where common suffixes are shared, search tree or heap where only the parts of the tree structure that are on the path from the root to the updated item need copying).

Anything where the entire set has to be copied for every modification is bad. In those cases, you'd tend to overlay small "diffs" that are checked (taking extra time) with recursion to previous versions. Every so often, when the diffs become too large, you could do a deep copy/rebuild (so the amortized cost is not as bad).

Persistent data structures can have significant overhead, but if you have very efficient allocation of small objects (JVM GC qualifies), they can be very practical - in the best case, equally as fast as the mutable equivalent, giving persistence only at the cost of the memory used - which can be much less than a full copy on every update.

Depending on your language, you'll probably find the syntax you want difficult to achieve without operator overloading for assignment. Lvalues (mutable references) in C++ definitely require non-persistent semantics.

like image 143
Jonathan Graehl Avatar answered Nov 02 '22 08:11

Jonathan Graehl