Given a list of tuples like this: <pre class="prettyprint"><code>dic = [(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")] </code></pre> How to group items of dic resulting in a list grp where, <pre class="prettyprint"><code>grp = [(1,["aa","bb","cc"]), (2, ["aa"]), (3, ["ff","gg"])] </code></pre> I'm actually a newcomer to Haskell...and seems to be falling in love with it.. Using group or groupBy in Data.List will only group similar adjacent items in a list. I wrote an inefficient function for this, but it results in memory failures as I need to process a very large coded string list. Hope you would help me find a more efficient way.

Whenever possible, reuse library code. <pre class="prettyprint"><code>import Data.Map sortAndGroup assocs = fromListWith (++) [(k, [v]) | (k, v) <- assocs] </code></pre> Try it out in ghci: <pre class="prettyprint"><code>*Main> sortAndGroup [(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")] fromList [(1,["bb","cc","aa"]),(2,["aa"]),(3,["gg","ff"])] </code></pre> EDIT In the comments, some folks are worried about whether <code>(++)</code> or <code>flip (++)</code> is the right choice. The documentation doesn't say which way things get associated; you can find out by experimenting, or you can sidestep the whole issue using difference lists: <pre class="prettyprint"><code>sortAndGroup assocs = ($[]) <$> fromListWith (.) [(k, (v:)) | (k, v) <- assocs] -- OR sortAndGroup = fmap ($[]) . M.fromListWith (.) . map (fmap (:)) </code></pre> These alternatives are about the same length as the original, but they're a bit less readable to me.

Here's my solution: <pre class="prettyprint"><code>import Data.Function (on) import Data.List (sortBy, groupBy) import Data.Ord (comparing) myGroup :: (Eq a, Ord a) => [(a, b)] -> [(a, [b])] myGroup = map (\l -> (fst . head $ l, map snd l)) . groupBy ((==) `on` fst) . sortBy (comparing fst) </code></pre> This works by first sorting the list with <code>sortBy</code>: <pre class="prettyprint"><code>[(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")] => [(1,"aa"),(1,"bb"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg")] </code></pre> then grouping the list elements by the associated key with <code>groupBy</code>: <pre class="prettyprint"><code>[(1,"aa"),(1,"bb"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg")] => [[(1,"aa"),(1,"bb"),(1,"cc")],[(2,"aa")],[(3,"ff"),(3,"gg")]] </code></pre> and then transforming the grouped items to tuples with <code>map</code>: <pre class="prettyprint"><code>[[(1,"aa"),(1,"bb"),(1,"cc")],[(2,"aa")],[(3,"ff"),(3,"gg")]] => [(1,["aa","bb","cc"]), (2, ["aa"]), (3, ["ff","gg"])]`) </code></pre> Testing: <pre class="prettyprint"><code>> myGroup dic [(1,["aa","bb","cc"]),(2,["aa"]),(3,["ff","gg"])] </code></pre>

How to group similar items in a list using Haskell?

Tags:

haskell

Given a list of tuples like this:

dic = [(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]

How to group items of dic resulting in a list grp where,

grp  = [(1,["aa","bb","cc"]), (2, ["aa"]), (3, ["ff","gg"])]

I'm actually a newcomer to Haskell...and seems to be falling in love with it..
Using group or groupBy in Data.List will only group similar adjacent items in a list. I wrote an inefficient function for this, but it results in memory failures as I need to process a very large coded string list. Hope you would help me find a more efficient way.

754

asked Sep 13 '12 02:09

td123

Video Answer

2 Answers

Whenever possible, reuse library code.

import Data.Map sortAndGroup assocs = fromListWith (++) [(k, [v]) | (k, v) <- assocs]

Try it out in ghci:

*Main> sortAndGroup [(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")] fromList [(1,["bb","cc","aa"]),(2,["aa"]),(3,["gg","ff"])]

EDIT In the comments, some folks are worried about whether (++) or flip (++) is the right choice. The documentation doesn't say which way things get associated; you can find out by experimenting, or you can sidestep the whole issue using difference lists:

sortAndGroup assocs = ($[]) <$> fromListWith (.) [(k, (v:)) | (k, v) <- assocs] -- OR sortAndGroup = fmap ($[]) . M.fromListWith (.) . map (fmap (:))

These alternatives are about the same length as the original, but they're a bit less readable to me.

140

answered Sep 17 '22 06:09

Daniel Wagner

Here's my solution:

import Data.Function (on) import Data.List (sortBy, groupBy) import Data.Ord (comparing)  myGroup :: (Eq a, Ord a) => [(a, b)] -> [(a, [b])] myGroup = map (\l -> (fst . head $ l, map snd l)) . groupBy ((==) `on` fst)           . sortBy (comparing fst)

This works by first sorting the list with sortBy:

[(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]      => [(1,"aa"),(1,"bb"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg")]

then grouping the list elements by the associated key with groupBy:

[(1,"aa"),(1,"bb"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg")]  => [[(1,"aa"),(1,"bb"),(1,"cc")],[(2,"aa")],[(3,"ff"),(3,"gg")]]

and then transforming the grouped items to tuples with map:

[[(1,"aa"),(1,"bb"),(1,"cc")],[(2,"aa")],[(3,"ff"),(3,"gg")]]  => [(1,["aa","bb","cc"]), (2, ["aa"]), (3, ["ff","gg"])]`)

Testing:

> myGroup dic [(1,["aa","bb","cc"]),(2,["aa"]),(3,["ff","gg"])]

answered Sep 21 '22 06:09

Mikhail Glushenkov

Related questions
                            
                                Foldable, Monoid and Monad
                            
                                What monads can be expressed as Free over some functor?
                            
                                Haskell: How does 'atomicModifyIORef' work?
                            
                                How to improve the performance of this Haskell program?
                            
                                How to compile a resource into a binary in Haskell?
                            
                                How to make a type with restrictions
                            
                                What is the difference between different orderings of the same monad transformers?
                            
                                Haskell interact function
                            
                                Efficient queue in Haskell
                            
                                How to get Haskell QuickCheck 2.4 to increase # tests?
                            
                                Are closures a violation of the functional programming paradigm?
                            
                                Safe execution of untrusted Haskell code
                            
                                How can I understand ":t ((==) <*>)" in Haskell?
                            
                                Is there a canonical haskell type for "One or Both"?
                            
                                Given a Haskell type signature, is it possible to generate the code automatically?
                            
                                Lambda for type expressions in Haskell?
                            
                                Can GHC really never inline map, scanl, foldr, etc.?
                            
                                Where is the data constructor for 'State'?
                            
                                Are there any connections between Haskell and LINQ?
                            
                                Why is length of "Níðhöggr" 9?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With