LINQ library in .NET framework does have a very useful function called GroupBy, which I have been using all the time. Its type in Haskell would look like
Ord b => (a-> b) -> [a] -> [(b, [a])]
Its purpose is to classify items based on the given classification function f
into buckets, with each bucket containing similar items, that is (b, l)
such that for any item x
in l
, f x == b
.
Its performance in .NET is O(N) because it uses hash-tables, but in Haskell I am OK with O(N*log(N)).
I can't find anything similar in standard Haskell libraries. Also, my implementation in terms of standard functions is somewhat bulky:
myGroupBy :: Ord k => (a -> k) -> [a] -> [(k, [a])]
myGroupBy f = map toFst
. groupBy ((==) `on` fst)
. sortBy (comparing fst)
. map (\a -> (f a, a))
where
toFst l@((k,_):_) = (k, map snd l)
This is definitely not something I want to see amongst my problem-specific code.
My question is: how can I implement this function nicely exploiting standard libraries to their maximum?
Also, the seeming absence of such a standard function hints that it may rarely be needed by experienced Haskellers because they may know some better way. Is that true? What can be used to implement similar functionality in a better way?
Also, what would be the good name for it, considering groupBy
is already taken? :)
GHC.Exts.groupWith
groupWith :: Ord b => (a -> b) -> [a] -> [[a]]
Introduced as part of generalised list comprehensions: http://www.haskell.org/ghc/docs/7.0.2/html/users_guide/syntax-extns.html#generalised-list-comprehensions
Using Data.Map
as the intermediate structure:
import Control.Arrow ((&&&))
import qualified Data.Map as M
myGroupBy f = M.toList . M.fromListWith (++) . map (f &&& return)
The map
operation turns the input list into a list of keys paired with singleton lists containing the elements. M.fromListWith (++)
turns this into a Data.Map
, concatenating when two items have the same key, and M.toList
gets the pairs back out again.
Note that this reverses the lists, so adjust for that if necessary. It is also easy to replace return
and (++)
with other monoid-like operations if you for example only wanted the sum of the elements in each group.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With