Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

subsequences of length n from list performance

I implemented a version of this answer https://stackoverflow.com/a/9920425/1261166 (I don't know what was intended by the person answering)

sublistofsize 0 _        = [[]]
sublistofsize _ []       = []
sublistofsize n (x : xs) = sublistsThatStartWithX ++ sublistsThatDontStartWithX
  where sublistsThatStartWithX = map (x:) $ sublistofsize (n-1) xs
        sublistsThatDontStartWithX = sublistofsize n xs

what I'm unsure of is sublistsThatStartWithX = map (x:) $ sublistofsize (n-1) xs

I assume that map (x:) gives a problem performance wise, but not sure of how to solve it. I've done profiling on print $ length $ sublistofsize 5 $ primesToTakeFrom 50

COST CENTRE                                  MODULE                                        no.     entries  %time %alloc   %time %alloc
sublistofsize                             Main                                          112     4739871   46.9   39.9    96.9  100.0
 sublistofsize.sublistsThatDontStartWithX Main                                          124     2369935    2.2    0.0     2.2    0.0
 sublistofsize.sublistsThatStartWithX     Main                                          116     2369935   47.8   60.1    47.8   60.1

Did I implement it in a good way? Are there any faster ways of doing it?

like image 593
Viktor Mellgren Avatar asked Jan 21 '14 17:01

Viktor Mellgren


3 Answers

I assume that map (x:) gives a problem performance wise

No. map is coded efficiently and runs in linear time, no problems here.

However, your recursion might be a problem. You're both calling sublistofsize (n-1) xs and sublistofsize n xs, which - given a start list sublistofsize m (_:_:ys) - does evaluate the term sublistofsize (m-1) ys twice, as there is no sharing between them in the different recursive steps.

So I'd apply dynamic programming to get

subsequencesOfSize :: Int -> [a] -> [[a]]
subsequencesOfSize n xs = let l = length xs
                          in if n>l then [] else subsequencesBySize xs !! (l-n)
 where
   subsequencesBySize [] = [[[]]]
   subsequencesBySize (x:xs) = let next = subsequencesBySize xs
                             in zipWith (++) ([]:next) (map (map (x:)) next ++ [[]])

Not that appending the empty lists is the most beautiful solution, but you can see how I have used zipWith with the displaced lists so that the results from next are used twice - once directly in the list of subsequences of length n and once in the list of subsequences of length n+1.

Testing it in GHCI with :set +s, you can see how this is drastically faster than the naive solutions:

*Main> length $ subsequencesOfSize 7 [1..25]
480700
(0.25 secs, 74132648 bytes)
(0.28 secs, 73524928 bytes)
(0.30 secs, 73529004 bytes)
*Main> length $ sublistofsize 7 [1..25] -- @Vixen (question)
480700
(3.03 secs, 470779436 bytes)
(3.35 secs, 470602932 bytes)
(3.14 secs, 470747656 bytes)
*Main> length $ sublistofsize' 7 [1..25] -- @Ganesh
480700
(2.00 secs, 193610388 bytes)
(2.00 secs, 193681472 bytes)
*Main> length $ subseq 7 [1..25] -- @user5402
480700
(3.07 secs, 485941092 bytes)
(3.07 secs, 486279608 bytes)
like image 57
Bergi Avatar answered Sep 19 '22 12:09

Bergi


Your implementation is the natural "Haskell-ish" one for that problem.

If you end up using the entire result, then there won't be anything asymptotically faster for this problem given the output datastructure ([[a]]) because it runs in time linear in the length of the output.

The use of map (x:) is a very natural way to add an element onto the start of each list and there's unlikely to be any significantly faster options given that we are working with lists.

In principle the repeated use of (++) is inefficient as it causes the left-hand argument to be traversed each time it is called, but the total cost in this case should only be an extra constant factor.

You might be able to improve it with the use of an accumulating parameter otherResults to collect the results, but to make this change you also need to pass down prefix in reversed order and re-reverse it at the end, which could well eat up the savings:

sublistofsize' 0 _        prefix otherResults = reverse prefix : otherResults
sublistofsize' _ []       prefix otherResults = otherResults
sublistofsize' n (x : xs) prefix otherResults =
   sublistofsize' (n-1) xs (x:prefix) (sublistofsize' n xs prefix otherResults)

sublistofsize n xs = sublistofsize' n xs [] []
like image 27
GS - Apologise to Monica Avatar answered Sep 18 '22 12:09

GS - Apologise to Monica


An optimization which should help is to keep track of whether there are enough elements in the list to form the rest of the subsequence. This can be done very efficiently by keeping track of a pointer which is n-1-elements ahead of xs and advancing them both as you recurse.

An implementation:

  nthtail 0 xs = xs
  nthtail _ [] = []
  nthtail n (x:xs) = nthtail (n-1) xs

  subseq 0 _ = [[]]
  subseq n xs =
    if null t
      then []
      else go n xs t
    where
      t = nthtail (n-1) xs  -- n should always be >= 1 here
      go 0 _ _  =  [[]]
      go _ _ [] = []
      go n xs@(x:xt) t = withx ++ withoutx
        where withx = map (x:) $ go (n-1) xt t
              withoutx = go n xt (tail t)
like image 39
ErikR Avatar answered Sep 19 '22 12:09

ErikR