Efficient version of 'inits'

Question

That is, inits "abc" == ["","a","ab","abc"]

There is a standard version of inits in Data.List, but below I have written a version myself:

myInits = f id
 where
  f start (l:ls) = (start []):(f (start . (l:)) ls)
  f start [] = [(start [])]

Whilst my version is quite a bit simpler than the standard version, I suspect it's not as good for efficiency reasons. I suspect that when myInits l is fully evaluated it takes O(n^2) space. Take for example, myTails, an implementation of tails:

myTails a@(_:ls) = a:(myTails ls)
myTails [] = [[]]

Which is almost exactly the same as the standard version and I suspect achieves O(n) space by reusing the tails of the lists.

Could someone explain:

Why my version of inits is bad.
Why another approach is better (either the standard one in Data.List or your own).

Cirdec · Accepted Answer

Your myInits uses a technique called a difference list to make functions that build lists in linear time. I believe, but haven't checked, that the running time for completely evaluating myInits is O(n^2) requiring O(n^2) space. Fully evaluating inits also requires O(n^2) running time and space. Any version of inits will require O(n^2) space; lists built with : and [] can only share their tails, and there are no tails in common among the results of inits. The version of inits in Data.List uses an amortized time O(1) queue, much like the simpler queue described in the second half of a related answer. The Snoc referenced in the source code in Data.List is word play on Cons (another name for :) backwards - being able to append an item to the end of the list.

Briefly experimenting with these functions suggests myInits performs satisfactorily when used sparsely on a large list. On my computer, in ghci, myInits [1..] !! 8000000 yields results in a few seconds. Unfortunately, I have the horrifyingly inefficient implementation that shipped with ghc 7.8.3, so I can't compare myInits to inits.

Strictness

There is one big difference between myInits and inits and between myTails and tails. They have different meanings when applied to undefined or _|_ (pronounced "bottom", another symbol we use for undefined).

inits has the strictness property inits (xs ++ _|_) = inits xs ++ _|_, which, when xs is the empty list [] says that inits will still yield at least one result when applied to undefined

inits (xs ++ _|_) = inits xs ++ _|_
inits ([] ++ _|_) = inits [] ++ _|_
inits        _|_  = [[]]     ++ _|_
inits        _|_  =  []      :  _|_

We can see this experimentally

> head . inits $ undefined
[]

myInits does not have this property either for the empty list or for longer lists.

> head $ myInits undefined
*** Exception: Prelude.undefined

> take 3 $ myInits ([1,2] ++ undefined)
[[],[1]*** Exception: Prelude.undefined

We can fix this if we realize that f in myInits would yield start [] in either branch. Therefore, we can delay the pattern matching until it is needed to decide what to do next.

myInits' = f id
 where
  f start list = (start []):
                 case list of
                     (l:ls) -> f (start . (l:)) ls
                     []     -> []

This increase in laziness makes myInits' work just like inits.

> head $ myInits' undefined
[]

> take 3 $ myInits' ([1,2] ++ undefined)
[[],[1],[1,2]]

Similarly, the difference between your myTails and tails in Data.List is that tails yields the entire list as the first result before deciding if there will be a remainder of the list. The documentation says it obeys tails _|_ = _|_ : _|_, but it actually obeys a much stronger rule that's hard to describe easily.

Will Ness · Answer

The prefix-functions building can be separated from their reification as actual lists:

diffInits = map ($ []) . scanl (\a x -> a . (x:)) id

This is noticeably faster (tested inside GHCi), and is lazier than your version (see Cirdec's answer for the discussion):

diffInits        _|_  == []           :  _|_
diffInits (xs ++ _|_) == diffInits xs ++ _|_

Efficient version of 'inits'

Tags:

haskell

Clinton

2 Answers

Strictness

Cirdec

Will Ness

Recent Activity

Donate For Us

Efficient version of 'inits'

Tags:

haskell

Clinton

2 Answers

Strictness

Cirdec

Will Ness

Related questions

Recent Activity

Donate For Us