Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoid revisiting node in an invariant directed graph

This is perhaps related to functional data structures, but I found no tags about this topic.

Say I have a syntax tree type Tree, which is organised as a DAG by simply sharing common sub expressions. For example,

data Tree = Val Int | Plus Tree Tree

example :: Tree
example = let x = Val 42 in Plus x x

Then, on this syntax tree type, I have a pure function simplify :: Tree -> Tree, which, when given the root node of a Tree, simplifies the whole tree by first simplifying the children of that root node, and then handle the operation of the root node itself.

Since simplify is a pure function, and some nodes are shared, we expect not to call simplify multiple times on those shared nodes.

Here comes the problem. The whole data structure is invariant, and the sharing is transparent to the programmer, so it seems impossible to determine whether or not two nodes are in fact the same nodes.

The same problem happens when handling the so-called “tying-the-knot” structures. By tying the knot, we produce a finite data representation for an otherwise infinite data structure, e.g. let xs = 1 : xs in xs. Here xs itself is finite, but calling map succ on it does not necessarily produce a finite representation.

These problems can be concluded as such: when the data is organised in an invariant directed graph, how do we avoid revisiting the same node, doing duplicated work, or even resulting in non-termination when the graph happened to be cyclic?

Some ideas that I have thought of:

  1. Extend the Tree type to Tree a, making every nodes hold an extra a. When generating the graph, associate each node with a unique a value. The memory address should have worked here, despite that the garbage collector may move any heap object at any time.
  2. For the syntax tree example, we may store a STRef (Maybe Tree) in every node for the simplified version, but this might not be extensible, and injects some implementation detail of a specific operation to the whole data structure itself.
like image 682
Ruifeng Xie Avatar asked Jan 05 '20 07:01

Ruifeng Xie


1 Answers

This is a problem with a lot of research behind it. In general, you cannot observe sharing in a pure language like Haskell, due to referential transparency. But in practice, you can safely observe sharing so long as you restrict yourself to doing the observing in the IO monad. Andy Gill (one of the legends from the old Glasgow school of FP!) has written a wonderful paper about this about 10 years ago:

http://ku-fpg.github.io/files/Gill-09-TypeSafeReification.pdf

Very well worth reading, and the bibliography will give you pointers to prior art in this area and many suggested solutions, from "poor-man's morally-safe" approaches to fully monadic knot-tying techniques. In my mind, Andy's solution and the corresponding reify package in Hackage:

https://hackage.haskell.org/package/data-reify

are the most practical solutions to this problem. And I can tell from experience that they work really well in practice.

like image 196
alias Avatar answered Nov 16 '22 17:11

alias