This is perhaps related to functional data structures, but I found no tags about this topic.
Say I have a syntax tree type Tree
, which is organised as a DAG by simply sharing common sub expressions. For example,
data Tree = Val Int | Plus Tree Tree
example :: Tree
example = let x = Val 42 in Plus x x
Then, on this syntax tree type, I have a pure function simplify :: Tree -> Tree
, which, when given the root node of a Tree
, simplifies the whole tree by first simplifying the children of that root node, and then handle the operation of the root node itself.
Since simplify
is a pure function, and some nodes are shared, we expect not to call simplify
multiple times on those shared nodes.
Here comes the problem. The whole data structure is invariant, and the sharing is transparent to the programmer, so it seems impossible to determine whether or not two nodes are in fact the same nodes.
The same problem happens when handling the so-called “tying-the-knot” structures. By tying the knot, we produce a finite data representation for an otherwise infinite data structure, e.g. let xs = 1 : xs in xs
. Here xs
itself is finite, but calling map succ
on it does not necessarily produce a finite representation.
These problems can be concluded as such: when the data is organised in an invariant directed graph, how do we avoid revisiting the same node, doing duplicated work, or even resulting in non-termination when the graph happened to be cyclic?
Some ideas that I have thought of:
Tree
type to Tree a
, making every nodes hold an extra a
. When generating the graph, associate each node with a unique a
value. The memory address should have worked here, despite that the garbage collector may move any heap object at any time. STRef (Maybe Tree)
in every node for the simplified version, but this might not be extensible, and injects some implementation detail of a specific operation to the whole data structure itself. This is a problem with a lot of research behind it. In general, you cannot observe sharing in a pure language like Haskell, due to referential transparency. But in practice, you can safely observe sharing so long as you restrict yourself to doing the observing in the IO
monad. Andy Gill (one of the legends from the old Glasgow school of FP!) has written a wonderful paper about this about 10 years ago:
http://ku-fpg.github.io/files/Gill-09-TypeSafeReification.pdf
Very well worth reading, and the bibliography will give you pointers to prior art in this area and many suggested solutions, from "poor-man's morally-safe" approaches to fully monadic knot-tying techniques. In my mind, Andy's solution and the corresponding reify
package in Hackage:
https://hackage.haskell.org/package/data-reify
are the most practical solutions to this problem. And I can tell from experience that they work really well in practice.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With