In "A Gentle Introduction to Haskell" we have such declaration of Tree's type:
data Tree a = Leaf a | Branch (Tree a) (Tree a)
deriving (Eq,Ord,Show,Read)
Let's to make some values of this type:
a1 = Leaf 1
a2 = Leaf 2
a3 = Leaf 3
a4 = a1 `Branch` a2
a5 = a2 `Branch` a3
a6 = a4 `Branch` a5
in ghci:
*Main> :t a6
a6 :: Tree Integer
But a6
is not Tree at all, see:
a6
/ \
a4 a5
/ \ / \
a1 a2 a3
There is a cycle in this graph! What is wrong? Is type definition of Tree correct? Or maybe, I can't catch some mistake in understanding of this example...
The short answer is that conceptually, a2
"is" a Tree a
, not a reference to a Tree a
. In that sense, a6
really looks like
a6
/ \
a4 a5
/ \ / \
a1 a2 a2 a3
that is, there are two "copies" of a2
in the tree.
Practically speaking, since all the values are immutable, the implementation can use the same memory to represent both the right child of a4
and the left child of a5
, but the two nodes remain distinct at the level of abstraction represented by the Tree a
type.
In order to actually have a cycle, there needs to be some notion of being able to reach both a4
and a5
from a2
, and this type doesn't provide a representation for any such child-to-parent links, which makes it impossible to tell if a4
's left child and a5
's right child are the same node, or two different nodes that look exactly alike. For this data type, the distinction simply does not exist.
We should distinguish between the value of an expression, and its memory representation.
For instance, both these expressions have the same value:
e1 = ("Hello", "Hello")
e2 = let s = "Hello" in (s, s)
Indeed, in Haskell there is no way to distinguish between the results of the evaluation of the expressions above. They are semantically equivalent.
A Haskell implementation (e.g. GHC) is free to represent those values in memory in any way that does not break the semantics. For instance:
"Hello"
twice in memory, and then use a pair of pointers (p1, p2)
. "Hello"
once in memory, and then use a pair of pointers (p, p)
.Note that both representations could, in theory, be used for any one of the expressions e1,e2
above. In practice, GHC will use the former for e2
and the latter for e1
, but that's not important.
In your trees, the same issue arises. The value of your a6
is a tree. GHC probably represents that tree as a non-tree DAG (i.e. a DAG which, if transformed into an undirected graph, has a cycle), but it does not matter, it's only an implementation detail. The important aspect is that such representation respects the semantics of a tree.
One might then wonder why a DAG representation is sound if the value is a tree. This holds because in Haskell we can not compare the underlying "references" p
used by GHC. If we had a function comparePtr :: a -> a -> Bool
which compares such references we could distinguish between e1
and e2
by using comparePtr (fst e) (snd e)
for e
between e1
,e2
. That would horribly break the soundness of the implementation. In Haskell we do not have that, though.
(Well, technically, there is an unsafe
function doing that, but unsafe
functions should never be used in "normal" code. We usually pretend those do not exist.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With