Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient version of 'inits'

Tags:

haskell

That is, inits "abc" == ["","a","ab","abc"]

There is a standard version of inits in Data.List, but below I have written a version myself:

myInits = f id
 where
  f start (l:ls) = (start []):(f (start . (l:)) ls)
  f start [] = [(start [])]

Whilst my version is quite a bit simpler than the standard version, I suspect it's not as good for efficiency reasons. I suspect that when myInits l is fully evaluated it takes O(n^2) space. Take for example, myTails, an implementation of tails:

myTails a@(_:ls) = a:(myTails ls)
myTails [] = [[]]

Which is almost exactly the same as the standard version and I suspect achieves O(n) space by reusing the tails of the lists.

Could someone explain:

  1. Why my version of inits is bad.
  2. Why another approach is better (either the standard one in Data.List or your own).
like image 458
Clinton Avatar asked Dec 27 '14 23:12

Clinton


2 Answers

Your myInits uses a technique called a difference list to make functions that build lists in linear time. I believe, but haven't checked, that the running time for completely evaluating myInits is O(n^2) requiring O(n^2) space. Fully evaluating inits also requires O(n^2) running time and space. Any version of inits will require O(n^2) space; lists built with : and [] can only share their tails, and there are no tails in common among the results of inits. The version of inits in Data.List uses an amortized time O(1) queue, much like the simpler queue described in the second half of a related answer. The Snoc referenced in the source code in Data.List is word play on Cons (another name for :) backwards - being able to append an item to the end of the list.

Briefly experimenting with these functions suggests myInits performs satisfactorily when used sparsely on a large list. On my computer, in ghci, myInits [1..] !! 8000000 yields results in a few seconds. Unfortunately, I have the horrifyingly inefficient implementation that shipped with ghc 7.8.3, so I can't compare myInits to inits.

Strictness

There is one big difference between myInits and inits and between myTails and tails. They have different meanings when applied to undefined or _|_ (pronounced "bottom", another symbol we use for undefined).

inits has the strictness property inits (xs ++ _|_) = inits xs ++ _|_, which, when xs is the empty list [] says that inits will still yield at least one result when applied to undefined

inits (xs ++ _|_) = inits xs ++ _|_
inits ([] ++ _|_) = inits [] ++ _|_
inits        _|_  = [[]]     ++ _|_
inits        _|_  =  []      :  _|_

We can see this experimentally

> head . inits $ undefined
[]

myInits does not have this property either for the empty list or for longer lists.

> head $ myInits undefined
*** Exception: Prelude.undefined

> take 3 $ myInits ([1,2] ++ undefined)
[[],[1]*** Exception: Prelude.undefined

We can fix this if we realize that f in myInits would yield start [] in either branch. Therefore, we can delay the pattern matching until it is needed to decide what to do next.

myInits' = f id
 where
  f start list = (start []):
                 case list of
                     (l:ls) -> f (start . (l:)) ls
                     []     -> []

This increase in laziness makes myInits' work just like inits.

> head $ myInits' undefined
[]

> take 3 $ myInits' ([1,2] ++ undefined)
[[],[1],[1,2]]

Similarly, the difference between your myTails and tails in Data.List is that tails yields the entire list as the first result before deciding if there will be a remainder of the list. The documentation says it obeys tails _|_ = _|_ : _|_, but it actually obeys a much stronger rule that's hard to describe easily.

like image 60
Cirdec Avatar answered Nov 04 '22 08:11

Cirdec


The prefix-functions building can be separated from their reification as actual lists:

diffInits = map ($ []) . scanl (\a x -> a . (x:)) id

This is noticeably faster (tested inside GHCi), and is lazier than your version (see Cirdec's answer for the discussion):

diffInits        _|_  == []           :  _|_
diffInits (xs ++ _|_) == diffInits xs ++ _|_
like image 34
Will Ness Avatar answered Nov 04 '22 08:11

Will Ness