I was thinking about flattening a binary tree to a list, for latter processing.
I first thought of using (++)
to join the left and right branches, but then thought in the worse case that would take O(n^2)
time.
I then thought of building the list backwards, using (:)
to append to the front in linear time. However, then I thought if I send this list to a fold-like function, it has to wait until the entire tree is traversed before it can begin folding, and hence can't use list fusion.
I then came up with the following:
data Tree a = Node a (Tree a) (Tree a) | Tip
flatten :: Tree a -> [a]
flatten x = (flatten' x) []
flatten' :: Tree a -> [a] -> [a]
flatten' (Node x left right) l = (flatten' left (x:(flatten' right l)))
flatten' Tip l = l
main =
putStrLn $ show $ flatten $
(Node 2 (Node 1 Tip Tip) (Node 4 (Node 3 Tip Tip) Tip))
Will this work in O(n)
time, take "stack space" no more than proportional to the greatest depth of the tree and can it be fused with a consuming function (i.e. intermediate list eliminated)? Is this the "correct" way to flatten a tree?
I don't know much about fusion, but I think recursive functions in general cannot be fused. But remember that when you are dealing with lists in Haskell, intermediate lists usually do not exist as a whole all at once -- you will know the beginning and not have computed the end, and then later you'll throw away the beginning and know the end (in as many steps as there are elements of the list). This is not fusion, this is more like "stream well-behavedness", and means that the space requirements are better if the output is consumed incrementally.
Anyway, yes, I think this is the best way to flatten a tree. When the output of an algorithm is a list but otherwise the list is unexamined, and there is concatenation going on, then difference lists (DList
s) are usually best way to go. They represent a list as a "prepender function", which eliminates the need for a traversal when you append, since appending is just function composition.
type DList a = [a] -> [a]
fromList :: [a] -> DList a
fromList xs = \l -> xs ++ l
append :: DList a -> DList a -> DList a
append xs ys = xs . ys
toList :: DList a -> [a]
toList xs = xs []
Those are the essentials of the implementation, the rest can be derived from that. The naive flattening algorithm in DList
s is:
flatten :: Tree a -> DList a
flatten (Node x left right) = flatten left `append` fromList [x] `append` flatten right
flatten Tip = fromList []
Let's do a little expansion. Start with the second equation:
flatten Tip = fromList []
= \l -> [] ++ l
= \l -> l
flatten Tip l = l
See where this is going? Now the first equation:
flatten (Node x left right)
= flatten left `append` fromList [x] `append` flatten right
= flatten left . fromList [x] . flatten right
= flatten left . (\l -> [x] ++ l) . flatten right
= flatten left . (x:) . flatten right
flatten (Node x) left right l
= (flatten left . (x:) . flatten right) l
= flatten left ((x:) (flatten right l))
= flatten left (x : flatten right l)
Which shows how the DList
formulation is equal to your function!
flatten' :: Tree a -> [a] -> [a]
flatten' (Node x left right) l = (flatten' left (x:(flatten' right l)))
flatten' Tip l = l
I don't have any proof for why DList
is better than other approaches (and ultimately it depends on how you are consuming your output), but DList
is the canonical way to do this efficiently, and that is what you have done.
flatten'
is tail recursive, so it shouldn't take any stack space. It will however walk down the left side of the tree, spitting out a bunch of thunks in the heap. If you invoke it on your example tree, and reduce it to WHNF, you should get something that looks like this:
:
/ \
1 flatten' Tip :
/ \
2 flatten' (Node 4) []
/ \
(Node 3) Tip
/ \
Tip Tip
The algorithm is O(N)
, but it has to examine the Tip
s as well as the Node
s.
This looks to be the most efficient way to flatten your tree in-order. The Data.Tree
module has a flatten
function here which does much the same thing, except it prefers a pre-order traversal.
Update:
In a graph reduction engine, the flatten
in main
will generate a graph like this:
@
/ \
@ []
/ \
/ \
/ \
flatten' Node 2
/ \
/ \
/ \
Node 1 Node 4
/ \ / \
Tip Tip / \
/ \
Node 3 Tip
/ \
Tip Tip
In order to reduce this to WHNF, the graph reduction engine will unroll the spine, pushing the []
and the Node 2
onto the stack. It will then invoke the flatten'
function, which will rewrite the graph to this:
@
/ \
/ \
/ \
@ :
/ \ / \
/ \ 2 \
/ \ \
flatten' Node 1 \
/ \ \
Tip Tip @
/ \
@ []
/ \
/ \
/ \
flatten' Node 4
/ \
/ \
/ \
Node 3 Tip
/ \
Tip Tip
And will pop the two arguments from the stack. The root node is still not in WHNF, so the graph reduction engine will unroll the spine, pushing the 2:...
and the Node 1
onto the stack. It will then invoke the flatten'
function, which will rewrite the graph to this:
@
/ \
/ \
/ \
@ :
/ \ / \
/ \ 1 \
/ \ \
flatten' Tip @
/ \
/ \
/ :
@ / \
/ \ 2 \
/ Tip @
/ / \
flatten' @ []
/ \
/ \
/ \
flatten' Node 4
/ \
/ \
/ \
Node 3 Tip
/ \
Tip Tip
And will pop the two arguments from the stack. The root node is still not in WHNF, so the graph reduction engine will unroll the spine, pushing the 1:...
and the Tip
onto the stack. It will then invoke the flatten'
function, which will rewrite the graph to this:
:
/ \
1 \
\
@
/ \
/ \
/ :
@ / \
/ \ 2 \
/ Tip @
/ / \
flatten' @ []
/ \
/ \
/ \
flatten' Node 4
/ \
/ \
/ \
Node 3 Tip
/ \
Tip Tip
And will pop the two arguments from the stack. We are now in WHNF, having consumed a maximum of two stack entries (assuming the Tree
nodes were not thunks that required additional stack space to evaluate).
So, flatten'
is tail-recursive. It replaces itself without having to evaluate additional nested redexes. The second flatten'
remains a thunk in the heap, not the stack.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With