I was doing a few of the 99 Haskell Problems
earlier and I thought that exercise 27 ("write a function to enumerate the possible combinations") was interesting as it's a simple concept and it lends itself to multiple implementations.
I was curious about relative efficiency so I decided to run a couple of different implementations - results are in the table below. (For reference: Emacs bash ansi-term in LXDE (Ubuntu 14.04) running on VirtualBox; Thinkpad X220; 8gb RAM, i5 64bit 2.4ghz.)
TL;DR:
(i) Why are combination-generating techniques #7 and #8 (from the table below; code included at bottom of post) so much faster than the rest?
(ii) Also, what do the figures in the Bytes
column actually represent?
(i) It's odd because function #7 works by filtering the powerset (which is waaaay larger than the combinations list); I suspect this is laziness at work, i.e., that this is the function which is most effectively exploiting the fact that we've only asked for the length of the list and not the list itself. (Also, its 'memory usage' is lower than that of the other functions, but, then again, I'm not sure exactly what memory-related stat is being shown.)
Regarding function #8: kudos to Bergi for that ridiculously fast implementation and thanks to user5402 for suggesting the addition. Still trying to wrap my ahead around the speed difference of this one.
(ii) The figures in the Bytes
column are reported by GHCi after running the :set +s
command; they clearly don't represent max memory usage as I only have ~25gb of RAM + free HD space.)?
Code:
import Data.List
--algorithms to generate combinations
--time required to compute the following: length $ 13 "abcdefghijklmnopqrstuvwxyz"
--(90.14 secs, 33598933424 bytes)
combDC1 :: (Eq a) => Int -> [a] -> [[a]]
combDC1 n xs = filter (/= []) $ combHelper n n xs []
combHelper :: Int -> Int -> [a] -> [a] -> [[a]]
combHelper n _ [] chosen = if length chosen == n
then [chosen]
else [[]]
combHelper n i remaining chosen
| length chosen == n = [chosen]
| n - length chosen > length remaining = [[]]
| otherwise = combHelper n (i-1) (tail remaining) ((head remaining):chosen) ++
combHelper n i (tail remaining) chosen
--(167.63 secs, 62756587760 bytes)
combSoln1 :: Int -> [a] -> [([a],[a])]
combSoln1 0 xs = [([],xs)]
combSoln1 n [] = []
combSoln1 n (x:xs) = ts ++ ds
where
ts = [ (x:ys,zs) | (ys,zs) <- combSoln1 (n-1) xs ]
ds = [ (ys,x:zs) | (ys,zs) <- combSoln1 n xs ]
--(71.40 secs, 30480652480 bytes)
combSoln2 :: Int -> [a] -> [[a]]
combSoln2 0 _ = [ [] ]
combSoln2 n xs = [ y:ys | y:xs' <- tails xs
, ys <- combSoln2 (n-1) xs']
--(83.75 secs, 46168207528 bytes)
combSoln3 :: Int -> [a] -> [[a]]
combSoln3 0 _ = return []
combSoln3 n xs = do
y:xs' <- tails xs
ys <- combSoln3 (n-1) xs'
return (y:ys)
--(92.34 secs, 40541644232 bytes)
combSoln4 :: Int -> [a] -> [[a]]
combSoln4 0 _ = [[]]
combSoln4 n xs = [ xs !! i : x | i <- [0..(length xs)-1]
, x <- combSoln4 (n-1) (drop (i+1) xs) ]
--(90.63 secs, 33058536696 bytes)
combSoln5 :: Int -> [a] -> [[a]]
combSoln5 _ [] = [[]]
combSoln5 0 _ = [[]]
combSoln5 k (x:xs) = x_start ++ others
where x_start = [ x : rest | rest <- combSoln5 (k-1) xs ]
others = if k <= length xs then combSoln5 k xs else []
--(61.74 secs, 33053297832 bytes)
combSoln6 :: Int -> [a] -> [[a]]
combSoln6 0 _ = [[]]
combSoln6 _ [] = []
combSoln6 n (x:xs) = (map (x:) (combSoln6 (n-1) xs)) ++ (combSoln6 n xs)
--(8.41 secs, 10785499208 bytes)
combSoln7 k ns = filter ((k==).length) (subsequences ns)
--(3.15 secs, 2889815872 bytes)
subsequencesOfSize :: Int -> [a] -> [[a]]
subsequencesOfSize n xs = let l = length xs
in if n>l then [] else subsequencesBySize xs !! (l-n)
where
subsequencesBySize [] = [[[]]]
subsequencesBySize (x:xs) = let next = subsequencesBySize xs
in zipWith (++) ([]:next) (map (map (x:)) next ++ [[]])
You should also test the algorithm found in this SO answer:
subsequences of length n from list performance
subsequencesOfSize :: Int -> [a] -> [[a]]
subsequencesOfSize n xs = let l = length xs
in if n>l then [] else subsequencesBySize xs !! (l-n)
where
subsequencesBySize [] = [[[]]]
subsequencesBySize (x:xs) = let next = subsequencesBySize xs
in zipWith (++) ([]:next) (map (map (x:)) next ++ [[]])
On my machine I get the following timing and memory usage from ghci:
ghci> length $ combSoln7 13 "abcdefghijklmnopqrstuvwxyz"
10400600
(13.42 secs, 10783921008 bytes)
ghci> length $ subsequencesOfSize 13 "abcdefghijklmnopqrstuvwxyz"
10400600
(6.52 secs, 2889807480 bytes)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With