Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GroupBy function from .NET in Haskell

LINQ library in .NET framework does have a very useful function called GroupBy, which I have been using all the time. Its type in Haskell would look like

Ord b => (a-> b) -> [a] -> [(b, [a])]

Its purpose is to classify items based on the given classification function f into buckets, with each bucket containing similar items, that is (b, l) such that for any item x in l, f x == b.

Its performance in .NET is O(N) because it uses hash-tables, but in Haskell I am OK with O(N*log(N)).

I can't find anything similar in standard Haskell libraries. Also, my implementation in terms of standard functions is somewhat bulky:

myGroupBy :: Ord k => (a -> k) -> [a] -> [(k, [a])]
myGroupBy f = map toFst
        . groupBy ((==) `on` fst) 
        . sortBy (comparing fst) 
        . map (\a -> (f a, a))
    where
        toFst l@((k,_):_) = (k, map snd l)

This is definitely not something I want to see amongst my problem-specific code.

My question is: how can I implement this function nicely exploiting standard libraries to their maximum?

Also, the seeming absence of such a standard function hints that it may rarely be needed by experienced Haskellers because they may know some better way. Is that true? What can be used to implement similar functionality in a better way?

Also, what would be the good name for it, considering groupBy is already taken? :)

like image 603
Rotsor Avatar asked May 21 '11 03:05

Rotsor


Video Answer


2 Answers

GHC.Exts.groupWith

groupWith :: Ord b => (a -> b) -> [a] -> [[a]]

Introduced as part of generalised list comprehensions: http://www.haskell.org/ghc/docs/7.0.2/html/users_guide/syntax-extns.html#generalised-list-comprehensions

like image 82
sclv Avatar answered Oct 07 '22 15:10

sclv


Using Data.Map as the intermediate structure:

import Control.Arrow ((&&&))
import qualified Data.Map as M

myGroupBy f = M.toList . M.fromListWith (++) . map (f &&& return)

The map operation turns the input list into a list of keys paired with singleton lists containing the elements. M.fromListWith (++) turns this into a Data.Map, concatenating when two items have the same key, and M.toList gets the pairs back out again.

Note that this reverses the lists, so adjust for that if necessary. It is also easy to replace return and (++) with other monoid-like operations if you for example only wanted the sum of the elements in each group.

like image 26
hammar Avatar answered Oct 07 '22 14:10

hammar