I’m trying to learn Haskell and I was trying to create a function that takes a list of lists and groups the sublist by equivalent sums. This is not homework.
import Data.List
let x = [[1,2],[2,1],[5,0],[0,3],[1,9]]
let groups = groupBy (\i j -> sum i == sum j) x
I get this output in GHCi:
[[[1,2],[2,1]],[[5,0]],[[0,3]],[[1,9]]]
I get [[1,2],[2,1]]
grouping together, but not with [0,3]
. Why is this?
I suspect I need to use map
, but I can’t seem to make it work.
The groupBy
function preserves the input order and is thus invertible. If you’re willing to throw away that information, you could use code along the lines of
import Data.List (foldl')
import Data.Map (elems,empty,insertWith')
bucketBy :: Ord b => (a -> b) -> [a] -> [[a]]
bucketBy eq = elems . foldl' go empty
where go m l = insertWith' (++) (eq l) [l] m
In action:
*Main> bucketBy sum x [[[0,3],[2,1],[1,2]],[[5,0]],[[1,9]]]
The application of elems
from Data.Map gives a clue for what’s happening.
elems :: Map κ α -> [α]
O(n). Return all elements of the map in the ascending order of their keys.
elems (fromList [(5,"a"), (3,"b")]) == ["b","a"] elems empty == []
A Map associates values of some type κ with values of another possibly distinct type α. In the example from your question, you start with x
whose type is
*Main> :type x x :: [[Integer]]
That is, x
is a list of integer lists. The type of the resulting partition of x
you want is
*Main> :t [[[0,3],[2,1],[1,2]],[[5,0]],[[1,9]]] [[[0,3],[2,1],[1,2]],[[5,0]],[[1,9]]] :: Num τ => [[[τ]]]
or a list of lists where each of the latter lists are themselves lists that all have the same sum. The Num τ =>
bit is a context that constrains the type τ to be an instance of the typeclass Num
. Happy for us, Integer
is such a type:
*Main> :info Integer data Integer … instance Num Integer -- Defined in GHC.Num …
We know then that the type of the partition is [[[Integer]]]
. This typeclass nonsense may seem unnecessarily fussy, but we’ll need the concept again in just a moment. (To give you an idea of what’s going on, the typechecker doesn’t have enough information to decide whether the literal 0
, for example, is of type Int
or Integer
.)
Each sublist contains lists with the same sum. In other words, there exists a mapping from a sum to a list of integer lists. Therefore, the type of the Map used in bucketBy
must resemble
Map Integer [[Integer]]
For example, with the sum 3 we associate the list
[ [0,3]
, [2,1]
, [1,2]
]
Folding is a highly general pattern. Left fold, foldl
and friends in Haskell lets you “insert” an operator between elements of a list beginning with the zero value at the left end of the list. For example, the sum of [5,3,9,1]
expressed as a left fold is
((((0 + 5) + 3) + 9) + 1)
or
foldl (+) 0 [5,3,9,1]
That is, beginning with a base value of zero, we successively add elements of the list and accumulate the sum.
Recall the definition of bucketBy
contains
elems . foldl' go empty
This means the result of the left fold must be of type Map Integer [[Integer]]
, the zero value for our fold is the empty Map of that type, and go
is somehow adding each successive value of a list into the map.
Note that foldl'
is the strict cousin of foldl
, but strictness is beyond the scope of this answer. (See also “Stack overflow” on HaskellWiki.)
Given the type of foldl'
*Main> :t foldl' foldl' :: (a -> b -> a) -> a -> [b] -> a
we should have three arguments in the application, but only two are present in the code above. This is because the code is written in point-free style. Your list is there implicitly due to partial application of foldl'
.
Think back to the sum-as-fold example above. The type of that application without the final argument is
*Main> :t foldl (+) 0 foldl (+) 0 :: Num b => [b] -> b
Partial application allows us to create new functions. Here we defined a function that computes a number from some list of numbers. Hmm, sounds familiar.
*Main> :t sum sum :: Num a => [a] -> a
The .
combinator expresses function composition. Its name is chosen to resemble the notation g∘f as commonly seen in mathematics textbooks to mean “do f first and then compute g from the result.” This is exactly what’s happening in the definition of bucketBy
: fold the list of values into a Map and then get the values of out the Map.
In your comment, you asked about the purpose of m
. With an explicit type annotation, we might define go
as
...
where go :: Map Integer [[Integer]] -> [Integer] -> Map Integer [[Integer]]
go m l = insertWith' (++) (eq l) [l] m
Matching variables with types, m
is the Map we’ve accumulated so far, and l
is the next Integer
list that we want to toss into the appropriate bucket. Recall that eq
is an argument to the outer bucketBy
.
We can control how a new item goes into the map using insertWith'
. (By convention, functions whose names end with trailing quotes are strict variants.)
The (++)
combinator appends lists. The application eq l
determines the appropriate bucket for l
.
Had we written l
rather than [l]
, the result would want to be
*Main> bucketBy sum x [[0,3,2,1,1,2],[5,0],[1,9]]
but then we lose the structure of the innermost lists.
We’ve already constrained the type of bucketBy
's result to be [[[α]]]
and thus the type of the Map's elements. Say the next item l
to fold is [1,2]
. We want to append, (++)
, it to some other list of type [[Integer]]
, but the types don’t match.
*Main> [[0,3],[2,1]] ++ [1,2] <interactive>:1:21: No instance for (Num [t0]) arising from the literal `2' Possible fix: add an instance declaration for (Num [t0]) In the expression: 2 In the second argument of `(++)', namely `[1, 2]' In the expression: [[0, 3], [2, 1]] ++ [1, 2]
Wrapping l
gets us
*Main> [[0,3],[2,1]] ++ [[1,2]] [[0,3],[2,1],[1,2]]
You might stop with
bucketBy :: ([Integer] -> Integer) -> [[Integer]] -> [[[Integer]]]
bucketBy eq = elems . foldl' go empty
where go m l = insertWith' (++) (eq l) [l] m
or even
bucketBy :: ([Integer] -> Integer) -> [[Integer]] -> [[[Integer]]]
bucketBy eq = elems . foldl' go empty
where go :: Map Integer [[Integer]] -> [Integer] -> Map Integer [[Integer]]
go m l = insertWith' (++) (eq l) [l] m
and be perfectly happy because it handles the case from your question.
Suppose down the road you have a different list y
defined as
y :: [[Int]]
y = [[1,2],[2,1],[5,0],[0,3],[1,9]]
Even though the definition is very nearly identical to x
, bucketBy
is of no use with y
.
*Main> bucketBy sum y <interactive>:1:15: Couldn't match expected type `Integer' with actual type `Int' Expected type: [[Integer]] Actual type: [[Int]] In the second argument of `bucketBy', namely `y' In the expression: bucketBy sum y
Let’s assume you can’t change the type of y
for some reason. You might copy-and-paste to create another function, say bucketByInt
, where the only change is replacing Integer
with Int
in the type annotations.
This would be highly, highly unsatisfying.
Maybe later you have some list of strings that you want to bucket according to the length of the longest string in each. In this imaginary paradise you could
*Main> bucketBy (maximum . map length) [["a","bc"],["d"],["ef","g"],["hijk"]] [[["d"]],[["ef","g"],["a","bc"]],[["hijk"]]]
What you want is entirely reasonable: bucket some list of things using the given criterion. But alas
*Main> bucketBy (maximum . map length) [["a","bc"],["d"],["ef","g"],["hijk"]] <interactive>:1:26: Couldn't match expected type `Integer' with actual type `[a0]' Expected type: Integer -> Integer Actual type: [a0] -> Int In the first argument of `map', namely `length' In the second argument of `(.)', namely `map length'
Again, you may be tempted to write bucketByString
, but by this point, you’re ready to move away and become a shoe cobbler.
The typechecker is your friend. Go back to your definition of bucketBy
that’s specific to Integer
lists, simply comment out the type annotation and ask its type.
*Main> :t bucketBy bucketBy :: Ord k => (b -> k) -> [b] -> [[b]]
Now you can apply bucketBy
for the different cases above and get the expected results. You were already in paradise but didn’t know it!
Now, in keeping with good style, you provide annotations for the toplevel definition of bucketBy
to help the poor reader, perhaps yourself. Note that you must provide the Ord
constraint due to the use of insertWith'
, whose type is
insertWith' :: Ord k => (a -> a -> a) -> k -> a -> Map k a -> Map k a
You may want to be really explicit and give an annotation for the inner go
, but this requires use of the scoped type variables extension.
{-# LANGUAGE ScopedTypeVariables #-}
import Data.List (foldl')
import Data.Map (Map,elems,empty,insertWith')
bucketBy :: forall a b. Ord b => (a -> b) -> [a] -> [[a]]
bucketBy eq = elems . foldl' go empty
where go :: Map b [a] -> a -> Map b [a]
go m l = insertWith' (++) (eq l) [l] m
Without the extension and with a type annotation of
bucketBy :: Ord b => (a -> b) -> [a] -> [[a]]
the typechecker will fail with errors of the form
Could not deduce (b ~ b1) from the context (Ord b) bound by the type signature for bucketBy :: Ord b => (a -> b) -> [a] -> [[a]] at prog.hs:(10,1)-(12,46) `b' is a rigid type variable bound by the type signature for bucketBy :: Ord b => (a -> b) -> [a] -> [[a]] at prog.hs:10:1 `b1' is a rigid type variable bound by the type signature for go :: Map b1 [a1] -> a1 -> Map b1 [a1] at prog.hs:12:9 In the return type of a call of `eq' In the second argument of `insertWith'', namely `(eq l)' In the expression: insertWith' (++) (eq l) [l] m
This is because the typechecker treats the b
on the inner type annotation as a distinct and entirely unrelated type b1
even though a human reader plainly sees the intent that they be the same type.
Read the scoped type variables documentation for details.
You may wonder where the outer layer of brackets went. Notice that the type annotation generalized from
bucketBy :: ([Integer] -> Integer) -> [[Integer]] -> [[[Integer]]]
to
bucketBy :: forall a b. Ord b => (a -> b) -> [a] -> [[a]]
Note that [Integer]
is itself another type, represented here as a
.
groupBy
splits the list into chunks of adjacent elements satisfying the given predicate. Since in your case, the [0,3]
is separated from the [1,2]
and [2,1]
, the first group includes only these. To collect all elements of the list having the same sum into one group, you need some preprocessing, e.g. with sortBy
.
import Data.List
import Data.Function
import Data.Ord
groupBySum :: Num a => [[a]] -> [[[a]]]
groupBySum xss = groups
where
ys = map (\xs -> (sum xs,xs)) xss
sortedSums = sortBy (comparing fst) ys
groupedSums = groupBy ((==) `on` fst) sortedSums
groups = map (map snd) groupedSums
From hackage:
The group function takes a list and returns a list of lists such that the concatenation of the result is equal to the argument.
groupBy
is the same, except that you can specify your equality test. Thus, since in your input list [0,3]
is not adjacent to [1,2]
or [2,1]
, it is put on its own.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With