Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell: Flatten binary tree

I was thinking about flattening a binary tree to a list, for latter processing.

I first thought of using (++) to join the left and right branches, but then thought in the worse case that would take O(n^2) time.

I then thought of building the list backwards, using (:) to append to the front in linear time. However, then I thought if I send this list to a fold-like function, it has to wait until the entire tree is traversed before it can begin folding, and hence can't use list fusion.

I then came up with the following:

data Tree a = Node a (Tree a) (Tree a) | Tip

flatten :: Tree a -> [a]
flatten x = (flatten' x) []

flatten' :: Tree a -> [a] -> [a]
flatten' (Node x left right) l = (flatten' left (x:(flatten' right l)))
flatten' Tip l = l

main = 
  putStrLn $ show $ flatten $ 
    (Node 2 (Node 1 Tip Tip) (Node 4 (Node 3 Tip Tip) Tip))

Will this work in O(n) time, take "stack space" no more than proportional to the greatest depth of the tree and can it be fused with a consuming function (i.e. intermediate list eliminated)? Is this the "correct" way to flatten a tree?

like image 560
Clinton Avatar asked May 15 '12 01:05

Clinton


2 Answers

I don't know much about fusion, but I think recursive functions in general cannot be fused. But remember that when you are dealing with lists in Haskell, intermediate lists usually do not exist as a whole all at once -- you will know the beginning and not have computed the end, and then later you'll throw away the beginning and know the end (in as many steps as there are elements of the list). This is not fusion, this is more like "stream well-behavedness", and means that the space requirements are better if the output is consumed incrementally.

Anyway, yes, I think this is the best way to flatten a tree. When the output of an algorithm is a list but otherwise the list is unexamined, and there is concatenation going on, then difference lists (DLists) are usually best way to go. They represent a list as a "prepender function", which eliminates the need for a traversal when you append, since appending is just function composition.

type DList a = [a] -> [a]

fromList :: [a] -> DList a
fromList xs = \l -> xs ++ l

append :: DList a -> DList a -> DList a
append xs ys = xs . ys

toList :: DList a -> [a]
toList xs = xs []

Those are the essentials of the implementation, the rest can be derived from that. The naive flattening algorithm in DLists is:

flatten :: Tree a -> DList a
flatten (Node x left right) = flatten left `append` fromList [x] `append` flatten right
flatten Tip = fromList []

Let's do a little expansion. Start with the second equation:

flatten Tip = fromList []
            = \l -> [] ++ l
            = \l -> l
flatten Tip l = l

See where this is going? Now the first equation:

flatten (Node x left right) 
    = flatten left `append` fromList [x] `append` flatten right
    = flatten left . fromList [x] . flatten right
    = flatten left . (\l -> [x] ++ l) . flatten right
    = flatten left . (x:) . flatten right
flatten (Node x) left right l
    = (flatten left . (x:) . flatten right) l
    = flatten left ((x:) (flatten right l))
    = flatten left (x : flatten right l)

Which shows how the DList formulation is equal to your function!

flatten' :: Tree a -> [a] -> [a]
flatten' (Node x left right) l = (flatten' left (x:(flatten' right l)))
flatten' Tip l = l

I don't have any proof for why DList is better than other approaches (and ultimately it depends on how you are consuming your output), but DList is the canonical way to do this efficiently, and that is what you have done.

like image 197
luqui Avatar answered Nov 05 '22 15:11

luqui


flatten' is tail recursive, so it shouldn't take any stack space. It will however walk down the left side of the tree, spitting out a bunch of thunks in the heap. If you invoke it on your example tree, and reduce it to WHNF, you should get something that looks like this:

  :
 / \
1  flatten' Tip :
               / \
              2   flatten' (Node 4) []
                           /      \
                         (Node 3) Tip
                        /       \
                       Tip      Tip

The algorithm is O(N), but it has to examine the Tips as well as the Nodes.

This looks to be the most efficient way to flatten your tree in-order. The Data.Tree module has a flatten function here which does much the same thing, except it prefers a pre-order traversal.

Update:

In a graph reduction engine, the flatten in main will generate a graph like this:

               @
              / \
             @  []
            / \
           /   \
          /     \
       flatten' Node 2
                /    \
               /      \
              /        \
           Node 1    Node 4
           /   \     /   \
          Tip  Tip  /     \
                   /       \
                Node 3     Tip
                /   \
               Tip  Tip

In order to reduce this to WHNF, the graph reduction engine will unroll the spine, pushing the [] and the Node 2 onto the stack. It will then invoke the flatten' function, which will rewrite the graph to this:

                 @
                / \
               /   \
              /     \
             @       :
            / \     / \
           /   \   2   \
          /     \       \
       flatten' Node 1   \
                /   \     \
               Tip  Tip    @
                          / \
                         @  []
                        / \
                       /   \
                      /     \
                   flatten' Node 4
                            /   \
                           /     \
                          /       \
                       Node 3     Tip
                       /   \
                      Tip  Tip

And will pop the two arguments from the stack. The root node is still not in WHNF, so the graph reduction engine will unroll the spine, pushing the 2:... and the Node 1 onto the stack. It will then invoke the flatten' function, which will rewrite the graph to this:

                 @
                / \
               /   \
              /     \
             @       :
            / \     / \
           /   \   1   \
          /     \       \
       flatten' Tip      @
                        / \
                       /   \
                      /     :
                     @     / \
                    / \   2   \
                   /  Tip      @
                  /           / \
               flatten'      @  []
                            / \
                           /   \
                          /     \
                       flatten' Node 4
                                /   \
                               /     \
                              /       \
                           Node 3     Tip
                           /   \
                          Tip  Tip

And will pop the two arguments from the stack. The root node is still not in WHNF, so the graph reduction engine will unroll the spine, pushing the 1:... and the Tip onto the stack. It will then invoke the flatten' function, which will rewrite the graph to this:

                 :
                / \
               1   \
                    \
                     @
                    / \
                   /   \
                  /     :
                 @     / \
                / \   2   \
               /  Tip      @
              /           / \
           flatten'      @  []
                        / \
                       /   \
                      /     \
                   flatten' Node 4
                            /   \
                           /     \
                          /       \
                       Node 3     Tip
                       /   \
                      Tip  Tip

And will pop the two arguments from the stack. We are now in WHNF, having consumed a maximum of two stack entries (assuming the Tree nodes were not thunks that required additional stack space to evaluate).

So, flatten' is tail-recursive. It replaces itself without having to evaluate additional nested redexes. The second flatten' remains a thunk in the heap, not the stack.

like image 30
pat Avatar answered Nov 05 '22 17:11

pat