Tree or not (Haskell type understanding)

Question

In "A Gentle Introduction to Haskell" we have such declaration of Tree's type:

data Tree a = Leaf a | Branch (Tree a) (Tree a)
     deriving (Eq,Ord,Show,Read)

Let's to make some values of this type:

a1 = Leaf 1
a2 = Leaf 2
a3 = Leaf 3
a4 = a1 `Branch` a2
a5 = a2 `Branch` a3
a6 = a4 `Branch` a5

in ghci:

*Main> :t a6
a6 :: Tree Integer

But a6 is not Tree at all, see:

      a6
     / \
   a4   a5
  / \  /  \
a1   a2    a3

There is a cycle in this graph! What is wrong? Is type definition of Tree correct? Or maybe, I can't catch some mistake in understanding of this example...

chepner · Accepted Answer

The short answer is that conceptually, a2 "is" a Tree a, not a reference to a Tree a. In that sense, a6 really looks like

          a6
       /      \
     a4        a5
    /   \    /    \
  a1   a2   a2    a3

that is, there are two "copies" of a2 in the tree.

Practically speaking, since all the values are immutable, the implementation can use the same memory to represent both the right child of a4 and the left child of a5, but the two nodes remain distinct at the level of abstraction represented by the Tree a type.

In order to actually have a cycle, there needs to be some notion of being able to reach both a4 and a5 from a2, and this type doesn't provide a representation for any such child-to-parent links, which makes it impossible to tell if a4's left child and a5's right child are the same node, or two different nodes that look exactly alike. For this data type, the distinction simply does not exist.

chi · Answer

We should distinguish between the value of an expression, and its memory representation.

For instance, both these expressions have the same value:

e1 = ("Hello", "Hello")
e2 = let s = "Hello" in (s, s)

Indeed, in Haskell there is no way to distinguish between the results of the evaluation of the expressions above. They are semantically equivalent.

A Haskell implementation (e.g. GHC) is free to represent those values in memory in any way that does not break the semantics. For instance:

It might store the string "Hello" twice in memory, and then use a pair of pointers (p1, p2).
It might store the string "Hello" once in memory, and then use a pair of pointers (p, p).

Note that both representations could, in theory, be used for any one of the expressions e1,e2 above. In practice, GHC will use the former for e2 and the latter for e1, but that's not important.

In your trees, the same issue arises. The value of your a6 is a tree. GHC probably represents that tree as a non-tree DAG (i.e. a DAG which, if transformed into an undirected graph, has a cycle), but it does not matter, it's only an implementation detail. The important aspect is that such representation respects the semantics of a tree.

One might then wonder why a DAG representation is sound if the value is a tree. This holds because in Haskell we can not compare the underlying "references" p used by GHC. If we had a function comparePtr :: a -> a -> Bool which compares such references we could distinguish between e1 and e2 by using comparePtr (fst e) (snd e) for e between e1,e2. That would horribly break the soundness of the implementation. In Haskell we do not have that, though.

(Well, technically, there is an unsafe function doing that, but unsafe functions should never be used in "normal" code. We usually pretend those do not exist.)

Tree or not (Haskell type understanding)

Tags:

types

haskell

graph-theory

binary-tree

Vladimir

2 Answers

chepner

chi

Recent Activity

Donate For Us

Tree or not (Haskell type understanding)

Tags:

types

haskell

graph-theory

binary-tree

Vladimir

2 Answers

chepner

chi

Related questions

Recent Activity

Donate For Us