Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell groupBy function: How exactly does it work?

Tags:

haskell

I experienced the following behavior:

ghci :m +Data.List

ghci> groupBy (\x y -> succ x == y) [1..6]
[[1,2], [3,4], [5,6]]
ghci> groupBy (\x y -> succ x /= y) [1..6]
[[1], [2], [3], [4], [5], [6]]

ghci :m +Data.Char   -- just a test to verify that nothing is broken with my ghc

ghci> groupBy (const isAlphaNum) "split this"
["split"," this"]

which surprised me, I thought, based on the example down below, that groupBy splits a list whenever the predicate evaluates to True for two successive elements supplied to a predicate. But in my second example above it splits the list on every element, but the predicate should evaluate to False. I framed my assumption of how it works in Haskell as well, just so everybody understands how I believed it to work:

groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy p l@(x:y:ys)
  | p x y     = (x:y:(head l)) ++ (tail l)       -- assuming l has a tail, unwilling to
  | otherwise = [x] ++ (y:(head l)) ++ (tail l)  -- to debug this right now, I guess you
groupBy _ [x] = [[x]]                            -- already got the hang of it ;)

Which brought me to the conclusion, that it works somewhat different. So my question is, how does that function actually work?

like image 463
tifrel Avatar asked Aug 12 '17 19:08

tifrel


1 Answers

But in my second example above it splits the list on every element, but the predicate should evaluate to False.

In the second example it also evaluates every two consecutive elements. The function on which it works is const isAlphaNum. So that means that the type is:

const isAlphaNum :: b -> Char -> Bool

It thus calls the function with the beginning of the group and the element, but it takes only the second element into account.

So if we call it with: groupBy (const isAlphaNum) "split this", it will evaluate:

succs   2nd    const isAlphaNum
-------------------------------
"sp"    'p'    True
"sl"    'l'    True
"si"    'i'    True
"st"    't'    True
"s "    ' '    False
" t"    't'    True
" h"    'h'    True
" i"    'i'    True
" s"    's'    True

Every time const isAlphaNum is True, it will append the character to the current sequence. So in case we evaluate "t ", const isAlphaNum, it will evaluate to False, groupBy will start a new group.

So here we thus construct two groups since there is only one False.

We can also obtain this result if we analyze the function source code:

groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _  []     =  []
groupBy eq (x:xs) =  (x:ys) : groupBy eq zs
    where (ys,zs) = span (eq x) xs

So here groupBy will return the empty list if the given list is empty. In case it is not an empty list (x:xs) then we will construct a new sequence. The sequence starts with x, and contains furthermore all the elements of ys. ys is the first element of the 2-tuple constructed by span.

span :: (a -> Bool) -> [a] -> ([a],[a]) constructs a 2-tuple where the first element is the longest possible prefix of the list that satisfies the predicate, the predicate here is eq x so we keep adding elements to the group, as long as eq x y (with y an element) holds.

With the remaining part of the list ys, we construct a new group until the input is completely exhausted.

like image 184
Willem Van Onsem Avatar answered Nov 20 '22 04:11

Willem Van Onsem