Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Building a BST from a depth-first preorder list in Haskell more idiomatically

This submission to Programming Praxis gives an O(n) function that "undoes" a preorder traversal of a binary search tree, converting a list back into a tree. Supplying the missing data declaration:

data Tree a = Leaf | Branch {value::a, left::Tree a, right:: Tree a}
                 deriving (Eq, Show)

fromPreOrder :: Ord a => [a] -> Tree a
fromPreOrder [] = Leaf
fromPreOrder (a:as) = Branch a l (fromPreOrder bs)
  where
    (l,bs) = lessThan a as

lessThan n [] = (Leaf,[])
lessThan n all@(a:as)
  | a >= n    = (Leaf,all)
  | otherwise = (Branch a l r,cs)
  where (l,bs) = lessThan a as
        (r,cs) = lessThan n bs

It's obvious that one constructor is added to the tree in each recursive step, which is key to its efficiency.

The only "problem" is that the list is threaded through the computation manually, which is not a terribly Haskellian way to do it and makes it a little harder to see that it is actually consumed element by element in a single-threaded manner.

I attempted to correct this using a state monad (prettified on Codepad):

import Control.Monad.State

data Tree a = Leaf
            | Branch {root::a, left::Tree a, right::Tree a}
               deriving (Eq,Show)

peek = State peek' where
  peek' [] = (Nothing,[])
  peek' a@(x:_) = (Just x,a)
 
pop = State pop' where
  pop' [] = error "Tried to read past the end of the list"
  pop' (_:xs) = ((),xs)

prebuild'::Ord a => State [a] (Tree a)
prebuild' = do
  next <- peek
  case next of
    Nothing -> return Leaf
    Just x -> do
                 pop
                 leftpart <- lessThan x
                 rightpart <- prebuild'
                 return (Branch x leftpart rightpart) 

lessThan n = do
  next <- peek
  case next of
    Nothing -> return Leaf
    Just x -> do
      if x < n
      then do
         pop
         leftpart <- lessThan x
         rightpart <- lessThan n
         return (Branch x leftpart rightpart)
      else
         return Leaf

prebuild::Ord a => [a] -> Tree a
prebuild = evalState prebuild'

Unfortunately, this just looks obscenely messy, and doesn't seem any easier to reason about.

One thought I haven't been able to get anywhere with yet (in part because I don't have a deep enough understanding of the underlying concepts, quite likely): could I use a left fold over the list that builds a continuation that ultimately produces the tree? Would that be possible? Also, would it be anything short of insane?

Another thought was to write this as a tree unfold, but I don't think it's possible to do that efficiently; the list will end up being traversed too many times and the program will be O(n^2).

Edit

Taking things from another direction, I have the nagging suspicion that it might be possible to come up with an algorithm that starts by splitting up the list into increasing segments and decreasing segments, but I haven't yet found something concrete to do with that idea.

like image 352
dfeuer Avatar asked Jan 08 '14 18:01

dfeuer


3 Answers

I think the problem you're having with State is that your primitives (push, pop, peek) are not the right ones. I think a better one would be something like available_, which checks if the front of the stack matches a particular condition, and executes something different in each case:

available_ p f m = do
    s <- get
    case s of
        x:xs | p x -> put xs >> f x
        _ -> m

Actually, in our use case, we can specialize a bit: we will always want to return a Leaf when the head of our stack doesn't satisfy the condition, and we'll always want to recurse when it does.

available p m = available_ p
    (\x -> liftM2 (Branch x) (lessThan' x) m)
    (return Leaf)

(You could also just write available to begin with and skip available_ entirely. In my first iteration, that is what I did.) Now writing fromPreOrder and lessThan are a snap, and also I think give some insight into their behavior. I'll name them with primes so we can double-check they do the right thing with QuickCheck.

fromPreOrder' = available (const True) fromPreOrder'
lessThan' n   = available (<n)         (lessThan' n)

And in ghci:

> quickCheck (\xs -> fromPreOrder (xs :: [Int]) == evalState fromPreOrder' xs)
+++ OK, passed 100 tests.
like image 131
Daniel Wagner Avatar answered Nov 15 '22 08:11

Daniel Wagner


While I can't answer the question about continuation passing, I believe that the State monad based implementation can be written much more clearly. First, we can use notational convenience such as those from Control.Applicative to make it easier to read. Second, we can upgrade the effect stack to include Maybe in order to capture the notion of failure (a) from taking the head of an empty list and (b) from the (a >= n) comparison as an effect.

import Control.Monad.State
import Control.Applicative

The final code uses the backtracking-state monad transformer stack. This means that we wrap State around Maybe instead of Maybe around State. In some sense we can think of this as meaning that failure is the "primary" effect. In practice it means that if the algorithm fails there's no way to continue using potentially bad state and so it must backtrack to the last known good state.

type Preord a b = StateT [a] Maybe b    

Since we keep taking the head of a list and want to capture that failure correctly, we'll use a "safe head" function (which is the natural destructor of a list anyway, despite not being in the base Haskell libraries)

-- Safe list destructor
uncons :: [a] -> Maybe (a, [a])
uncons []     = Nothing
uncons (a:as) = Just (a, as)

If we look at it cleverly we'll notice that this is already exactly the form of our monadic computation (StateT [a] Maybe b is isomorphic to [a] -> Maybe (b, [a])). We'll give it a more evocative name when lifted into the Monad.

-- Try to get the head or fail
getHead :: Preord a a
getHead = StateT uncons

A common feature of this algorithm is stopping local failures by providing a default value. I'll capture this in the certain combinator

-- Provides a default value for a failing computation     
certain :: b -> Preord a b -> Preord a b
certain def p = p <|> return def

And now we can write the final algorithm very cleanly in our Preord monad.

fromPreOrder :: Ord a => Preord a (Tree a)
fromPreOrder = certain Leaf $ do
  a <- getHead
  Branch a <$> lessThan a <*> fromPreOrder

lessThan :: Ord a => a -> Preord a (Tree a)
lessThan n = certain Leaf $ do
  a <- getHead
  guard (a < n)
  Branch a <$> lessThan a <*> lessThan n

Note that Applicative style helps to indicate that we're building the components of the Branch constructor using further effectful (state consuming) computations. The guard short-circuits lessThan when the pivot is already the least element in the pre-order traversal. We also explicitly see how both fromPreOrder and lessThan default out to Leaf when they cannot compute a better result.

(Also note that fromPreOrder and lessThan are nearly identical now, a commonality Daniel Wagner exploited in his own answer when writing available.)

We finally would want to hide all the monadic noise since, to an outside user, this is just a pure algorithm.

rebuildTree :: [a] -> Tree a
rebuildTree = fromMaybe Leaf . runStateT fromPreOrder

For a complete picture, here's the implementation of the algorithm using only the State monad. Note all the extra noise for handling failure! We've absorbed the entire popElse function into the effects of the backtracking state monad. We also lift the if up into the failure effect. Without that effect stack, our combinators are terrifically specific to the application instead of decomplected and useful elsewhere.

-- Try to take the head of the state list and return the default
-- if that's not possible.
popElse :: b -> (a -> State [a] b) -> State [a] b
popElse def go = do
  x <- get
  case x of
    []     -> return def
    (a:as) -> put as >> go a

push :: a -> State [a] ()
push a = modify (a:)

fromPreOrder :: Ord a => State [a] (Tree a)
fromPreOrder = popElse Leaf $ \a -> Branch a <$> lessThan a <*> fromPreOrder

lessThan :: Ord a => a -> State [a] (Tree a)
lessThan n = 
  popElse Leaf $ \a ->
    if a >= n
    then push a >> return Leaf
    else Branch a <$> lessThan a <*> lessThan n
like image 37
J. Abrahamson Avatar answered Nov 15 '22 07:11

J. Abrahamson


As you've said, the state monad doesn't really improve the situation, and I don't think it can be expected to, as it's both much too general in that it allows arbitrary access to the state, and annoying in that it enforces unnecessary sequencing.

At first glance, this looks quite like a foldr : we do one thing for the empty case, and in the (:) case we take the head off and make a recursive call based on the tail. However, as the recursive call isn't just using the tail directly, it isn't quite a foldr.

We could express it as a paramorphism but I don't think that really adds anything to the readability.

What I did notice is that the complicated recursion on the tail is all based on lessThan, which led me to the following idea for breaking down the algorithm:

lessThans [] = []
lessThans (a:as) = (a, l) : lessThans bs
   where (l, bs) = lessThan a as

fromPreOrder2 :: Ord a => [a] -> Tree a
fromPreOrder2 = foldr (\(a, l) r -> Branch a l r) Leaf . lessThans

I'm sure lessThans could have a better name but I'm not quite sure what!

The foldr can also be expressed as foldr (uncurry Branch) Leaf but I'm not sure if that's an improvement.

EDIT: also, lessThans is an unfoldr, leading to this version:

fromPreOrder3 :: Ord a => [a] -> Tree a
fromPreOrder3 = foldr (uncurry Branch) Leaf . unfoldr lessThanList

lessThanList [] = Nothing
lessThanList (a:as) = Just ((a, l), bs)
    where (l, bs) = lessThan a as
like image 26
GS - Apologise to Monica Avatar answered Nov 15 '22 07:11

GS - Apologise to Monica