I've just started working my way through Okasaki's Purely Functional Data Structures, but have been doing things in Haskell rather than Standard ML. However, I've come across an early exercise (2.5) that's left me a bit stumped on how to do things in Haskell:
Inserting an existing element into a binary search tree copies the entire search path even though the copied nodes are indistinguishable from the originals. Rewrite insert using exceptions to avoid this copying. Establish only one handler per insertion rather than one handler per iteration.
Now, my understanding is that ML, being an impure language, gets by with a conventional approach to exception handling not so different to, say, Java's, so you can accomplish it something like this:
type Tree = E | T of Tree * int * Tree
exception ElementPresent
fun insert (x, t) =
let fun go E = T (E, x, E)
fun go T(l, y, r) =
if x < y then T(go (l), x, r)
else if y < x then T(l, x, go (r))
else raise ElementPresent
in go t
end
handle ElementPresent => t
I don't have an ML implementation, so this may not be quite right in terms of the syntax.
My issue is that I have no idea how this can be done in Haskell, outside of doing everything in the IO
monad, which seems like cheating and even if it's not cheating, would seriously limit the usefulness of a function which really doesn't do any mutation. I could use the Maybe
monad:
data Tree a = Empty | Fork (Tree a) a (Tree a)
deriving (Show)
insert :: (Ord a) => a -> Tree a -> Tree a
insert x t = maybe t id (go t)
where go Empty = return (Fork Empty x Empty)
go (Fork l y r)
| x < y = do l' <- go l; return (Fork l' y r)
| x > y = do r' <- go r; return (Fork l y r')
| otherwise = Nothing
This means everything winds up wrapped in Just
on the way back up when the element isn't found, which requires more heap allocation, and sort of defeats the purpose. Is this allocation just the price of purity?
EDIT to add: A lot of why I'm wondering about the suitability of the Maybe
solution is that the optimization described only seems to save you all the constructor calls you would need in the case where the element already exists, which means heap allocations proportional to the length of the search path. The Maybe
also avoids those constructor calls when the element already exists, but then you get a number of Just
constructor calls equal to the length of the search path. I understand that a sufficiently smart compiler could elide all the Just
allocations, but I don't know if, say, the current version of GHC is really that smart.
Whenever an element is to be inserted, first locate its proper location. Start searching from the root node, then if the data is less than the key value, search for the empty location in the left subtree and insert the data. Otherwise, search for the empty location in the right subtree and insert the data.
For a binary tree to be a binary search tree, the data of all the nodes in the left sub-tree of the root node should be the data of the root. The data of all the nodes in the right subtree of the root node should be the data of the root. In Fig. 1, consider the root node with data = 10.
In terms of cost, the ML version is actually very similar to your Haskell version.
Every recursive call in the ML version results in a stack frame. The same is true in the Haskell version. This is going to be proportional in size to the path that you traverse in the tree. Also, both versions will of course allocate new nodes for the entire path if an insertion is actually performed.
In your Haskell version, every recursive call might also eventually result in the
allocation of a Just
node. This will go on the minor heap, which is just a block of
memory with a bump pointer. For all practical purposes, GHC's minor heap is roughly equivalent in
cost to the stack. Since these are short-lived allocations, they won't normally end up
being moved to the major heap at all.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With