Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to avoid copying the whole search path of a binary tree on insert?

I've just started working my way through Okasaki's Purely Functional Data Structures, but have been doing things in Haskell rather than Standard ML. However, I've come across an early exercise (2.5) that's left me a bit stumped on how to do things in Haskell:

Inserting an existing element into a binary search tree copies the entire search path even though the copied nodes are indistinguishable from the originals. Rewrite insert using exceptions to avoid this copying. Establish only one handler per insertion rather than one handler per iteration.

Now, my understanding is that ML, being an impure language, gets by with a conventional approach to exception handling not so different to, say, Java's, so you can accomplish it something like this:

type Tree = E | T of Tree * int * Tree

exception ElementPresent

fun insert (x, t) = 
  let fun go E = T (E, x, E)
      fun go T(l, y, r) = 
             if      x < y then T(go (l), x, r)
             else if y < x then T(l, x, go (r))
             else    raise ElementPresent
  in go t
  end 
  handle ElementPresent => t

I don't have an ML implementation, so this may not be quite right in terms of the syntax.

My issue is that I have no idea how this can be done in Haskell, outside of doing everything in the IO monad, which seems like cheating and even if it's not cheating, would seriously limit the usefulness of a function which really doesn't do any mutation. I could use the Maybe monad:

data Tree a = Empty | Fork (Tree a) a (Tree a)
        deriving (Show)

insert     :: (Ord a) => a -> Tree a -> Tree a
insert x t = maybe t id (go t)
  where go Empty   = return (Fork Empty x Empty)
    go (Fork l y r)
      | x < y     = do l' <- go l; return (Fork l' y r)
      | x > y     = do r' <- go r; return (Fork l y r')
      | otherwise = Nothing

This means everything winds up wrapped in Just on the way back up when the element isn't found, which requires more heap allocation, and sort of defeats the purpose. Is this allocation just the price of purity?

EDIT to add: A lot of why I'm wondering about the suitability of the Maybe solution is that the optimization described only seems to save you all the constructor calls you would need in the case where the element already exists, which means heap allocations proportional to the length of the search path. The Maybe also avoids those constructor calls when the element already exists, but then you get a number of Just constructor calls equal to the length of the search path. I understand that a sufficiently smart compiler could elide all the Just allocations, but I don't know if, say, the current version of GHC is really that smart.

like image 328
Pillsy Avatar asked May 22 '14 13:05

Pillsy


People also ask

What are the necessary steps to insert an element into a binary search tree?

Whenever an element is to be inserted, first locate its proper location. Start searching from the root node, then if the data is less than the key value, search for the empty location in the left subtree and insert the data. Otherwise, search for the empty location in the right subtree and insert the data.

How do you find the root node of a binary tree?

For a binary tree to be a binary search tree, the data of all the nodes in the left sub-tree of the root node should be the data of the root. The data of all the nodes in the right subtree of the root node should be the data of the root. In Fig. 1, consider the root node with data = 10.


1 Answers

In terms of cost, the ML version is actually very similar to your Haskell version.

Every recursive call in the ML version results in a stack frame. The same is true in the Haskell version. This is going to be proportional in size to the path that you traverse in the tree. Also, both versions will of course allocate new nodes for the entire path if an insertion is actually performed.

In your Haskell version, every recursive call might also eventually result in the allocation of a Just node. This will go on the minor heap, which is just a block of memory with a bump pointer. For all practical purposes, GHC's minor heap is roughly equivalent in cost to the stack. Since these are short-lived allocations, they won't normally end up being moved to the major heap at all.

like image 102
Jake McArthur Avatar answered Oct 12 '22 22:10

Jake McArthur