Strange behavior of groupBy

Q: What does Groupby mean and what is it used for?

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups. Used to determine the groups for the groupby.

Q: What is possible using Groupby () method of pandas?

groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names. sort : Sort group keys.

Q: Is pandas Groupby lazy?

It is mentioned in this tutorial that pandas groupby object is lazy. it's lazy in nature. It doesn't really do any operations to produce a useful result until you say so.

Q: What does Groupby size do?

groupby() function is one of the most useful function in the library it splits the data into groups based on columns/conditions and then apply some operations eg. size() which counts the number of entries/rows in each group. The groupby() can also be applied on series.

Tags:

haskell

I am wondering why the following call of groupBy does not work: My predicate is x < y, so I expect [1, 6] to be a group, but instead, Haskell put [1, 6, 4, 2] into a group.

Prelude Data.List> groupBy (\x y -> x < y) [8,5,3,2,1,6,4,2]
[[8],[5],[3],[2],[1,6,4,2]]

More strangely, when I change the last number to -2, I expect the same behavior as in the above example. That is, since both 2 and -2 are less than 4, I expect that in the result [1, 6, 4, -2] would make up a group. But instead, This time, Haskell put -2 to be a group.

Prelude Data.List> groupBy (\x y -> x < y) [8,5,3,2,1,6,4,-2]
[[8],[5],[3],[2],[1,6,4],[-2]]

Do I have a wrong understanding of groupBy?

420

asked Jul 31 '21 10:07

gefei

2 Answers

In the implementation of the groupBy, x is always the first item of the sublist. Indeed, groupBy is implemented as:

groupBy                 :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _  []           =  []
groupBy eq (x:xs)       =  (x:ys) : groupBy eq zs
                           where (ys,zs) = span (eq x) xs

especially the span (eq x) is important here, since x will be the first item of a new group.

Since x is thus not the previous value in the list. If we thus run groupBy with the list [5, 3, 2, 1, 6, 4, -2], we get:

list	current list	x=?	check with	outcome
`[5,3,2,1,6,4,-2]`	`[8]`	`8`	`/`	`/`
`[5,3,2,1,6,4,-2]`	`[8]`	`8`	`5`	`False`
`[3,2,1,6,4,-2]`	`[5]`	`5`	`/`	`/`
`[3,2,1,6,4,-2]`	`[5]`	`5`	`3`	`False`
`[3,2,1,6,4,-2]`	`[3]`	`3`	`/`	`/`
`[2,1,6,4,-2]`	`[3]`	`3`	`2`	`False`
`[2,1,6,4,-2]`	`[2]`	`2`	`1`	`False`
`[1,6,4,-2]`	`[2]`	`2`	`/`	`/`
`[1,6,4,-2]`	`[2]`	`2`	`1`	`False`
`[6,4,-2]`	`[1]`	`1`	`/`	`/`
`[4,-2]`	`[1,6]`	`1`	`6`	`True`
`[-2]`	`[1,6,4]`	`1`	`4`	`True`
`[]`	`[-2]`	`-2`	`/`	`/`

Especially the case where we compare x=1 and y=4 is important. If x was only the previous value, we should start generating a new list, but since x is the first item of the list, that is not the case.

Normally you should only work with an equivalence relation ~ [wiki], such relation is:

reflexive: so x ~ x is true;
symmetric: so x ~ y if and only if y ~ x; and
transitive: so x ~ y and y ~ z implies that x ~ z.

Your equivalence relation is not reflexive, nor is it symmetric. This is thus not a valid function to work with groupBy.

answered Oct 22 '22 09:10

Willem Van Onsem

The conceptual definition of groupBy p l is that it yields sublists of l such that for each xs in l, you have

all (==True) [p x y | x<-xs, y<-xs]

IOW, each sublist should be part of an equivalence class of p. That notion only makes sense if p is an equivalence relation. In particular, you need p x y ≡ p y x, and the defining equation also assumes that p x x is always true.

The implementation in the standard libraries shows that idea quite clearly: each x:ys list in the result has ys defined by the span of elements that are equivalent to x by the relation. So in your case, you get 1:[6,4,2], where 6,4,2 are all greater than 1.

Evidently, groupBy doesn't actually check p x y for all pairs of elements in the result lists, so this really only makes sense if p is indeed an equivalence relation.

What you expected the idea to be – and IMO this is not unreasonable – is that only for all x,y such that x is the left neighbour of y, you want p x y to hold. This is in general a weaker condition, but if p is an equivalence relation then it actually implies the original condition, because such a relation also is transitive. So maybe the implementation should actually be

groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy' _ [] = []
groupBy' _ (x:l) = (x:xs) : zss
 where (xs,zss) = case l of
        [] -> ([],[])
        zs@(y:_)
         -> let ys:zss' = groupBy' p zs
            in if p x y then (ys, zss')
                        else ([], ys:zss')

(This could be simplified a bit, but then it wouldn't be as lazy as the old implementation.)

answered Oct 22 '22 11:10

leftaroundabout

Related questions
                            
                                Does (->) have a data constructor?
                            
                                What are simple definitions for Control.Lens.Traversal's partsOf, holesOf and singular?
                            
                                TypeLits or Singletons: Promoting an `Integer` to `KnownNat` (`Nat`) at Runtime
                            
                                Why doesn't ghci provide the expected Ambiguous type variable error in this scenario?
                            
                                How do I specify names for missing commands in the help message generated by optparse-applicative?
                            
                                How can I get GHC Core output via stack?
                            
                                Getting even and odd position of elements in list - Haskell Mutual Recursion
                            
                                Why systematically mark record fields strict in Haskell?
                            
                                Applicative instance for non-empty leafy tree in Haskell
                            
                                What are the advantages of showsPrec over show?
                            
                                What are the difference in usage between Either and Except in Haskell?
                            
                                Why sequence requires monad if applicative would suffice?
                            
                                How to parse ternary expression using Parsec?
                            
                                How do I lazily transform a list into this type?
                            
                                Is the functor distribution law for the `Alt` typeclass trivial?
                            
                                Can adding a constraint cause other constraints to go out of scope?
                            
                                Trying to understanding why this function using foldr in Haskell isnt working
                            
                                Can some explain the reflection package API in *very* simple terms?
                            
                                How do i unset :set -v from ghci?
                            
                                Haskell computation performance

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With