I have a little alternative implementation of groupBy
, which is more useful to me than the version in Data.List
, because it doesn't require the test to be an equivalence relation:
groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy' f = foldr step []
where step x [] = [[x]]
step x (xs:xss)
| x `f` head xs = (x:xs):xss
| otherwise = [x]:xs:xss
However, it's too eager and won't start computing for inputs like groupBy' (<) [1,2,3,2,3,4,1,undefined]
. I have read the HaskellWiki and Wikibooks articles which explain why certain things, like pattern matches, can make functions less lazy, and I think I understand most of the examples given there. Still, I don't understand why this function can't start producing output until it hits the undefined
. Are the pattern matches causing this behavior?
Since I have just read those articles, it's maybe lack of experience that makes me fail to apply what I read there to my example code. So, how could this particular implementation be changed in order to behave more lazily?
The key problem is that you know that step x xss
will always produce a result of the form (x:_):_
, but you are "hiding" this behind the pattern matches, so Haskell is forced to evaluate those first to determine which case of step
to choose before it even sees those constructors.
In general, for foldr f x
to be able to produce any output before reaching the end of the list, f
must be able to produce some output before examining its second argument.
We can fix this by splitting step
into two, so that we can produce the two (:)
constructors before doing the pattern matching on the second argument.
groupBy' f = foldr step []
where step x xss = let (ys, yss) = step' x xss in (x:ys):yss
step' x [] = ([], [])
step' x (xs:xss) | f x (head xs) = (xs, xss)
| otherwise = ([], xs:xss)
This is about as lazy as you can get it.
*Main> groupBy' (<) [1, 2, 3, 2, 3, 4, 1, undefined]
[[1,2,3],[2,3,4],[1*** Exception: Prelude.undefined
foldr step [] [1,2,3,...]
will expand to step 1 (foldr step [] [2,3])
. Now step
needs to decide whether to go in its first case or the second. For that it needs to know whether foldr step [] [2,3,...]
evaluates to an empty list. For that it needs to know whether step 2 (foldr step [] [3,...])
returns the empty list (which it never will, but Haskell does not know that). This goes on until the end of the list is reached (and if the list doesn't have an end, it goes on forever).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With