Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tree or not (Haskell type understanding)

In "A Gentle Introduction to Haskell" we have such declaration of Tree's type:

data Tree a = Leaf a | Branch (Tree a) (Tree a)
     deriving (Eq,Ord,Show,Read)

Let's to make some values of this type:

a1 = Leaf 1
a2 = Leaf 2
a3 = Leaf 3
a4 = a1 `Branch` a2
a5 = a2 `Branch` a3
a6 = a4 `Branch` a5

in ghci:

*Main> :t a6
a6 :: Tree Integer

But a6 is not Tree at all, see:

      a6
     / \
   a4   a5
  / \  /  \
a1   a2    a3

There is a cycle in this graph! What is wrong? Is type definition of Tree correct? Or maybe, I can't catch some mistake in understanding of this example...

like image 534
Vladimir Avatar asked Dec 11 '22 04:12

Vladimir


2 Answers

The short answer is that conceptually, a2 "is" a Tree a, not a reference to a Tree a. In that sense, a6 really looks like

          a6
       /      \
     a4        a5
    /   \    /    \
  a1   a2   a2    a3

that is, there are two "copies" of a2 in the tree.

Practically speaking, since all the values are immutable, the implementation can use the same memory to represent both the right child of a4 and the left child of a5, but the two nodes remain distinct at the level of abstraction represented by the Tree a type.

In order to actually have a cycle, there needs to be some notion of being able to reach both a4 and a5 from a2, and this type doesn't provide a representation for any such child-to-parent links, which makes it impossible to tell if a4's left child and a5's right child are the same node, or two different nodes that look exactly alike. For this data type, the distinction simply does not exist.

like image 61
chepner Avatar answered Dec 12 '22 18:12

chepner


We should distinguish between the value of an expression, and its memory representation.

For instance, both these expressions have the same value:

e1 = ("Hello", "Hello")
e2 = let s = "Hello" in (s, s)

Indeed, in Haskell there is no way to distinguish between the results of the evaluation of the expressions above. They are semantically equivalent.

A Haskell implementation (e.g. GHC) is free to represent those values in memory in any way that does not break the semantics. For instance:

  1. It might store the string "Hello" twice in memory, and then use a pair of pointers (p1, p2).
  2. It might store the string "Hello" once in memory, and then use a pair of pointers (p, p).

Note that both representations could, in theory, be used for any one of the expressions e1,e2 above. In practice, GHC will use the former for e2 and the latter for e1, but that's not important.

In your trees, the same issue arises. The value of your a6 is a tree. GHC probably represents that tree as a non-tree DAG (i.e. a DAG which, if transformed into an undirected graph, has a cycle), but it does not matter, it's only an implementation detail. The important aspect is that such representation respects the semantics of a tree.

One might then wonder why a DAG representation is sound if the value is a tree. This holds because in Haskell we can not compare the underlying "references" p used by GHC. If we had a function comparePtr :: a -> a -> Bool which compares such references we could distinguish between e1 and e2 by using comparePtr (fst e) (snd e) for e between e1,e2. That would horribly break the soundness of the implementation. In Haskell we do not have that, though.

(Well, technically, there is an unsafe function doing that, but unsafe functions should never be used in "normal" code. We usually pretend those do not exist.)

like image 28
chi Avatar answered Dec 12 '22 18:12

chi