The following function returns a list of possible paths starting from the root node to the deepest node of a tree:
paths :: Tree a -> [[a]]
paths (Node element []) = [[element]]
paths (Node element children) = map (element :) $ concat $ map paths children
This looks very inefficient on paper, since concat has terrible complexity. Can this function be rewritten in a way that keeps the complexity lower without using intermediate data structures (like sequence)?
EDIT: to be honest, I know one could avoid the O(n)/loop complexity of concat by:
Here is a JavaScript implementation that illustrates this algorithm:
function paths(tree){
var result = [];
(function go(node,path){
if (node.children.length === 0)
result.push(path.concat([node.tag]));
else
node.children.map(function(child){
go(child,path.concat([node.tag]));
});
})(tree,[]);
return result;
}
console.log(paths(
{tag: 1,
children:[
{tag: 2, children: [{tag: 20, children: []}, {tag: 200, children: []}]},
{tag: 3, children: [{tag: 30, children: []}, {tag: 300, children: []}]},
{tag: 4, children: [{tag: 40, children: []}, {tag: 400, children: []}]}]}));
(It is not actually O(1)/iteration since I used Array.concat instead of lists consing (JS has no built-in lists), but just using it instead would make it constant-time per iteration.)
concat does not have terrible complexity; it is O(n), where n is the total number of elements in each list but the last. In this case, I don't think it's possible to do any better, with or without an intermediate structure, unless you change the type of the result. The list of lists, in this context, offers absolutely no potential for sharing, so you have no choice but to allocate each "cons" of each list. The concatMap only adds a constant factor overhead, and I'd be surprised if you could find a way to reduce that significantly.
If you want to use some sharing (at the cost of structural laziness), you can indeed switch to a different data structure. This will only matter if the tree is somewhat "bushy". Any sequence type supporting snoc will do. At the simplest, you can even use lists in reverse, so you get paths leading from the leaves to the root instead of the other way around. Or you can use something more flexible like Data.Sequence.Seq:
import qualified Data.Sequence as S
import Data.Sequence ((|>), Seq)
import qualified Data.DList as DL
import Data.Tree
paths :: Tree a -> [Seq a]
paths = DL.toList . go S.empty
where
go s (Node a []) = DL.singleton (s |> a)
go s (Node a xs) = let sa = s |> a
in sa `seq` DL.concat . map (go sa) $ xs
As Viclib and delnan point out, there was a problem with my original answer, because the bottom level got traversed multiple times.
Let's benchmark:
{-# LANGUAGE BangPatterns #-}
import Control.DeepSeq
import Criterion.Main
import Data.Sequence ((|>), Seq)
import Data.Tree
import GHC.DataSize
import qualified Data.DList as DL
import qualified Data.Sequence as S
-- original version
pathsList :: Tree a -> [[a]]
pathsList = go where
go (Node element []) = [[element]]
go (Node element children) = map (element:) (concatMap go children)
-- with reversed lists, enabling sharing of path prefixes
pathsRevList :: Tree a -> [[a]]
pathsRevList = go [] where
go acc (Node a []) = [a:acc]
go acc (Node a xs) = concatMap (go (a:acc)) xs
-- dfeuer's version
pathsSeqDL :: Tree a -> [Seq a]
pathsSeqDL = DL.toList . go S.empty
where
go s (Node a []) = DL.singleton (s |> a)
go s (Node a xs) = let sa = s |> a
in sa `seq` DL.concat . map (go sa) $ xs
-- same as previous but without DLists.
pathsSeq :: Tree a -> [Seq a]
pathsSeq = go S.empty where
go acc (Node a []) = [acc |> a]
go acc (Node a xs) = let acc' = acc |> a
in acc' `seq` concatMap (go acc') xs
genTree :: Int -> Int -> Tree Int
genTree branch depth = go 0 depth where
go n 0 = Node n []
go n d = Node n [go n' (d - 1) | n' <- [n .. n + branch - 1]]
memSizes = do
let !tree = force $ genTree 4 4
putStrLn "sizes in memory"
putStrLn . ("list: "++) . show =<< (recursiveSize $!! pathsList tree)
putStrLn . ("listRev: "++) . show =<< (recursiveSize $!! pathsRevList tree)
putStrLn . ("seq: "++) . show =<< (recursiveSize $!! pathsSeq tree)
putStrLn . ("tree itself: "++) . show =<< (recursiveSize $!! tree)
benchPaths !tree = do
defaultMain [
bench "pathsList" $ nf pathsList tree,
bench "pathsRevList" $ nf pathsRevList tree,
bench "pathsSeqDL" $ nf pathsSeqDL tree,
bench "pathsSeq" $ nf pathsSeq tree
]
main = do
memSizes
putStrLn ""
putStrLn "normal tree"
putStrLn "-----------------------"
benchPaths (force $ genTree 6 8)
putStrLn "\ndeep tree"
putStrLn "-----------------------"
benchPaths (force $ genTree 2 20)
putStrLn "\nwide tree"
putStrLn "-----------------------"
benchPaths (force $ genTree 35 4)
Some notes:
genTree with some Int-s in order to prevent GHC optimization causing subtrees to be shared.memSizes the tree must be pretty small, because recursiveSize has quadratic complexity. Results on my Core i7 3770:
sizes in memory
list: 37096
listRev: 14560
seq: 26928
tree itself: 16576
normal tree
-----------------------
pathsList 372.9 ms
pathsRevList 213.6 ms
pathsSeqDL 962.2 ms
pathsSeq 308.8 ms
deep tree
-----------------------
pathsList 554.1 ms
pathsRevList 266.7 ms
pathsSeqDL 919.8 ms
pathsSeq 438.4 ms
wide tree
-----------------------
pathsList 191.6 ms
pathsRevList 129.1 ms
pathsSeqDL 448.2 ms
pathsSeq 157.3 ms
Comments:
DList only when we would otherwise have inefficient list appends, but it's not the case here. Seq performs relatively worse, presumably because Seq snoc is costlier than list cons. Seq is heavier in weight, though it supports a wider range of efficient operations.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With