Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group sublist values based to equivalent sums with Haskell

Tags:

haskell

I’m trying to learn Haskell and I was trying to create a function that takes a list of lists and groups the sublist by equivalent sums. This is not homework.

import Data.List
let x = [[1,2],[2,1],[5,0],[0,3],[1,9]]
let groups = groupBy (\i j -> sum i == sum j) x

I get this output in GHCi:

[[[1,2],[2,1]],[[5,0]],[[0,3]],[[1,9]]]

I get [[1,2],[2,1]] grouping together, but not with [0,3]. Why is this?

I suspect I need to use map, but I can’t seem to make it work.

like image 370
turtle Avatar asked Mar 30 '12 20:03

turtle


3 Answers

The groupBy function preserves the input order and is thus invertible. If you’re willing to throw away that information, you could use code along the lines of

import Data.List (foldl')
import Data.Map (elems,empty,insertWith')

bucketBy :: Ord b => (a -> b) -> [a] -> [[a]]
bucketBy eq = elems . foldl' go empty
  where go m l = insertWith' (++) (eq l) [l] m

In action:

*Main> bucketBy sum x
[[[0,3],[2,1],[1,2]],[[5,0]],[[1,9]]]

How it works

The application of elems from Data.Map gives a clue for what’s happening.

elems :: Map κ α -> [α]

O(n). Return all elements of the map in the ascending order of their keys.

elems (fromList [(5,"a"), (3,"b")]) == ["b","a"]
elems empty == []

Mapping

A Map associates values of some type κ with values of another possibly distinct type α. In the example from your question, you start with x whose type is

*Main> :type x
x :: [[Integer]]

That is, x is a list of integer lists. The type of the resulting partition of x you want is

*Main> :t [[[0,3],[2,1],[1,2]],[[5,0]],[[1,9]]]
[[[0,3],[2,1],[1,2]],[[5,0]],[[1,9]]] :: Num τ => [[[τ]]]

or a list of lists where each of the latter lists are themselves lists that all have the same sum. The Num τ => bit is a context that constrains the type τ to be an instance of the typeclass Num. Happy for us, Integer is such a type:

*Main> :info Integer
data Integer
…
instance Num Integer -- Defined in GHC.Num
…

We know then that the type of the partition is [[[Integer]]]. This typeclass nonsense may seem unnecessarily fussy, but we’ll need the concept again in just a moment. (To give you an idea of what’s going on, the typechecker doesn’t have enough information to decide whether the literal 0, for example, is of type Int or Integer.)

Each sublist contains lists with the same sum. In other words, there exists a mapping from a sum to a list of integer lists. Therefore, the type of the Map used in bucketBy must resemble

Map Integer [[Integer]]

For example, with the sum 3 we associate the list

[ [0,3]
, [2,1]
, [1,2]
]

The fold recursion pattern

Folding is a highly general pattern. Left fold, foldl and friends in Haskell lets you “insert” an operator between elements of a list beginning with the zero value at the left end of the list. For example, the sum of [5,3,9,1] expressed as a left fold is

((((0 + 5) + 3) + 9) + 1)

or

foldl (+) 0 [5,3,9,1]

That is, beginning with a base value of zero, we successively add elements of the list and accumulate the sum.

Recall the definition of bucketBy contains

elems . foldl' go empty

This means the result of the left fold must be of type Map Integer [[Integer]], the zero value for our fold is the empty Map of that type, and go is somehow adding each successive value of a list into the map.

Note that foldl' is the strict cousin of foldl, but strictness is beyond the scope of this answer. (See also “Stack overflow” on HaskellWiki.)

Dude, where’s my list?

Given the type of foldl'

*Main> :t foldl'
foldl' :: (a -> b -> a) -> a -> [b] -> a

we should have three arguments in the application, but only two are present in the code above. This is because the code is written in point-free style. Your list is there implicitly due to partial application of foldl'.

Think back to the sum-as-fold example above. The type of that application without the final argument is

*Main> :t foldl (+) 0
foldl (+) 0 :: Num b => [b] -> b

Partial application allows us to create new functions. Here we defined a function that computes a number from some list of numbers. Hmm, sounds familiar.

*Main> :t sum
sum :: Num a => [a] -> a

The . combinator expresses function composition. Its name is chosen to resemble the notation g∘f as commonly seen in mathematics textbooks to mean “do f first and then compute g from the result.” This is exactly what’s happening in the definition of bucketBy: fold the list of values into a Map and then get the values of out the Map.

If ya gotta go, go with a smile

In your comment, you asked about the purpose of m. With an explicit type annotation, we might define go as

...
  where go :: Map Integer [[Integer]] -> [Integer] -> Map Integer [[Integer]]
        go m l = insertWith' (++) (eq l) [l] m

Matching variables with types, m is the Map we’ve accumulated so far, and l is the next Integer list that we want to toss into the appropriate bucket. Recall that eq is an argument to the outer bucketBy.

We can control how a new item goes into the map using insertWith'. (By convention, functions whose names end with trailing quotes are strict variants.)

The (++) combinator appends lists. The application eq l determines the appropriate bucket for l.

Had we written l rather than [l], the result would want to be

*Main> bucketBy sum x
[[0,3,2,1,1,2],[5,0],[1,9]]

but then we lose the structure of the innermost lists.

We’ve already constrained the type of bucketBy's result to be [[[α]]] and thus the type of the Map's elements. Say the next item l to fold is [1,2]. We want to append, (++), it to some other list of type [[Integer]], but the types don’t match.

*Main> [[0,3],[2,1]] ++ [1,2]

<interactive>:1:21:
    No instance for (Num [t0])
      arising from the literal `2'
    Possible fix: add an instance declaration for (Num [t0])
    In the expression: 2
    In the second argument of `(++)', namely `[1, 2]'
    In the expression: [[0, 3], [2, 1]] ++ [1, 2]

Wrapping l gets us

*Main> [[0,3],[2,1]] ++ [[1,2]]
[[0,3],[2,1],[1,2]]

Generalizing further

You might stop with

bucketBy :: ([Integer] -> Integer) -> [[Integer]] -> [[[Integer]]]
bucketBy eq = elems . foldl' go empty
  where go m l = insertWith' (++) (eq l) [l] m

or even

bucketBy :: ([Integer] -> Integer) -> [[Integer]] -> [[[Integer]]]
bucketBy eq = elems . foldl' go empty
  where go :: Map Integer [[Integer]] -> [Integer] -> Map Integer [[Integer]]
        go m l = insertWith' (++) (eq l) [l] m

and be perfectly happy because it handles the case from your question.

Suppose down the road you have a different list y defined as

y :: [[Int]]
y = [[1,2],[2,1],[5,0],[0,3],[1,9]]

Even though the definition is very nearly identical to x, bucketBy is of no use with y.

*Main> bucketBy sum y

<interactive>:1:15:
    Couldn't match expected type `Integer' with actual type `Int'
    Expected type: [[Integer]]
      Actual type: [[Int]]
    In the second argument of `bucketBy', namely `y'
    In the expression: bucketBy sum y

Let’s assume you can’t change the type of y for some reason. You might copy-and-paste to create another function, say bucketByInt, where the only change is replacing Integer with Int in the type annotations.

This would be highly, highly unsatisfying.

Maybe later you have some list of strings that you want to bucket according to the length of the longest string in each. In this imaginary paradise you could

*Main> bucketBy (maximum . map length) [["a","bc"],["d"],["ef","g"],["hijk"]]
[[["d"]],[["ef","g"],["a","bc"]],[["hijk"]]]

What you want is entirely reasonable: bucket some list of things using the given criterion. But alas

*Main> bucketBy (maximum . map length) [["a","bc"],["d"],["ef","g"],["hijk"]]

<interactive>:1:26:
    Couldn't match expected type `Integer' with actual type `[a0]'
    Expected type: Integer -> Integer
      Actual type: [a0] -> Int
    In the first argument of `map', namely `length'
    In the second argument of `(.)', namely `map length'

Again, you may be tempted to write bucketByString, but by this point, you’re ready to move away and become a shoe cobbler.

The typechecker is your friend. Go back to your definition of bucketBy that’s specific to Integer lists, simply comment out the type annotation and ask its type.

*Main> :t bucketBy
bucketBy :: Ord k => (b -> k) -> [b] -> [[b]]

Now you can apply bucketBy for the different cases above and get the expected results. You were already in paradise but didn’t know it!

Now, in keeping with good style, you provide annotations for the toplevel definition of bucketBy to help the poor reader, perhaps yourself. Note that you must provide the Ord constraint due to the use of insertWith', whose type is

insertWith' :: Ord k => (a -> a -> a) -> k -> a -> Map k a -> Map k a

You may want to be really explicit and give an annotation for the inner go, but this requires use of the scoped type variables extension.

{-# LANGUAGE ScopedTypeVariables #-}

import Data.List (foldl')
import Data.Map (Map,elems,empty,insertWith')

bucketBy :: forall a b. Ord b => (a -> b) -> [a] -> [[a]]
bucketBy eq = elems . foldl' go empty
  where go :: Map b [a] -> a -> Map b [a]
        go m l = insertWith' (++) (eq l) [l] m

Without the extension and with a type annotation of

bucketBy :: Ord b => (a -> b) -> [a] -> [[a]]

the typechecker will fail with errors of the form

    Could not deduce (b ~ b1)
    from the context (Ord b)
      bound by the type signature for
                 bucketBy :: Ord b => (a -> b) -> [a] -> [[a]]
      at prog.hs:(10,1)-(12,46)
      `b' is a rigid type variable bound by
          the type signature for
            bucketBy :: Ord b => (a -> b) -> [a] -> [[a]]
          at prog.hs:10:1
      `b1' is a rigid type variable bound by
           the type signature for go :: Map b1 [a1] -> a1 -> Map b1 [a1]
           at prog.hs:12:9
    In the return type of a call of `eq'
    In the second argument of `insertWith'', namely `(eq l)'
    In the expression: insertWith' (++) (eq l) [l] m

This is because the typechecker treats the b on the inner type annotation as a distinct and entirely unrelated type b1 even though a human reader plainly sees the intent that they be the same type.

Read the scoped type variables documentation for details.

One last small surprise

You may wonder where the outer layer of brackets went. Notice that the type annotation generalized from

bucketBy :: ([Integer] -> Integer) -> [[Integer]] -> [[[Integer]]]

to

bucketBy :: forall a b. Ord b => (a -> b) -> [a] -> [[a]]

Note that [Integer] is itself another type, represented here as a.

like image 103
Greg Bacon Avatar answered Sep 23 '22 13:09

Greg Bacon


groupBy splits the list into chunks of adjacent elements satisfying the given predicate. Since in your case, the [0,3] is separated from the [1,2] and [2,1], the first group includes only these. To collect all elements of the list having the same sum into one group, you need some preprocessing, e.g. with sortBy.

import Data.List
import Data.Function
import Data.Ord

groupBySum :: Num a => [[a]] -> [[[a]]]
groupBySum xss = groups
  where
    ys = map (\xs -> (sum xs,xs)) xss
    sortedSums = sortBy (comparing fst) ys
    groupedSums = groupBy ((==) `on` fst) sortedSums
    groups = map (map snd) groupedSums
like image 42
Daniel Fischer Avatar answered Sep 23 '22 13:09

Daniel Fischer


From hackage:

The group function takes a list and returns a list of lists such that the concatenation of the result is equal to the argument.

groupBy is the same, except that you can specify your equality test. Thus, since in your input list [0,3] is not adjacent to [1,2] or [2,1], it is put on its own.

like image 20
Riccardo T. Avatar answered Sep 22 '22 13:09

Riccardo T.