That is, inits "abc" == ["","a","ab","abc"]
There is a standard version of inits
in Data.List
, but below I have written a version myself:
myInits = f id
where
f start (l:ls) = (start []):(f (start . (l:)) ls)
f start [] = [(start [])]
Whilst my version is quite a bit simpler than the standard version, I suspect it's not as good for efficiency reasons. I suspect that when myInits l
is fully evaluated it takes O(n^2)
space. Take for example, myTails
, an implementation of tails
:
myTails a@(_:ls) = a:(myTails ls)
myTails [] = [[]]
Which is almost exactly the same as the standard version and I suspect achieves O(n)
space by reusing the tails of the lists.
Could someone explain:
inits
is bad.Data.List
or your own).Your myInits
uses a technique called a difference list to make functions that build lists in linear time. I believe, but haven't checked, that the running time for completely evaluating myInits
is O(n^2)
requiring O(n^2)
space. Fully evaluating inits
also requires O(n^2)
running time and space. Any version of inits
will require O(n^2)
space; lists built with :
and []
can only share their tails, and there are no tails in common among the results of inits
. The version of inits
in Data.List
uses an amortized time O(1)
queue, much like the simpler queue described in the second half of a related answer. The Snoc
referenced in the source code in Data.List
is word play on Cons
(another name for :
) backwards - being able to append an item to the end of the list.
Briefly experimenting with these functions suggests myInits
performs satisfactorily when used sparsely on a large list. On my computer, in ghci, myInits [1..] !! 8000000
yields results in a few seconds. Unfortunately, I have the horrifyingly inefficient implementation that shipped with ghc 7.8.3, so I can't compare myInits
to inits
.
There is one big difference between myInits
and inits
and between myTails
and tails
. They have different meanings when applied to undefined
or _|_
(pronounced "bottom", another symbol we use for undefined
).
inits
has the strictness property inits (xs ++ _|_) = inits xs ++ _|_
, which, when xs
is the empty list []
says that inits
will still yield at least one result when applied to undefined
inits (xs ++ _|_) = inits xs ++ _|_
inits ([] ++ _|_) = inits [] ++ _|_
inits _|_ = [[]] ++ _|_
inits _|_ = [] : _|_
We can see this experimentally
> head . inits $ undefined
[]
myInits
does not have this property either for the empty list or for longer lists.
> head $ myInits undefined
*** Exception: Prelude.undefined
> take 3 $ myInits ([1,2] ++ undefined)
[[],[1]*** Exception: Prelude.undefined
We can fix this if we realize that f
in myInits
would yield start []
in either branch. Therefore, we can delay the pattern matching until it is needed to decide what to do next.
myInits' = f id
where
f start list = (start []):
case list of
(l:ls) -> f (start . (l:)) ls
[] -> []
This increase in laziness makes myInits'
work just like inits
.
> head $ myInits' undefined
[]
> take 3 $ myInits' ([1,2] ++ undefined)
[[],[1],[1,2]]
Similarly, the difference between your myTails
and tails
in Data.List
is that tails
yields the entire list as the first result before deciding if there will be a remainder of the list. The documentation says it obeys tails _|_ = _|_ : _|_
, but it actually obeys a much stronger rule that's hard to describe easily.
The prefix-functions building can be separated from their reification as actual lists:
diffInits = map ($ []) . scanl (\a x -> a . (x:)) id
This is noticeably faster (tested inside GHCi), and is lazier than your version (see Cirdec's answer for the discussion):
diffInits _|_ == [] : _|_
diffInits (xs ++ _|_) == diffInits xs ++ _|_
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With