I am wondering why the following call of groupBy
does not work: My predicate is x < y
, so I expect [1, 6]
to be a group, but instead, Haskell put [1, 6, 4, 2]
into a group.
Prelude Data.List> groupBy (\x y -> x < y) [8,5,3,2,1,6,4,2]
[[8],[5],[3],[2],[1,6,4,2]]
More strangely, when I change the last number to -2, I expect the same behavior as in the above example. That is, since both 2 and -2 are less than 4, I expect that in the result [1, 6, 4, -2]
would make up a group. But instead, This time, Haskell put -2 to be a group.
Prelude Data.List> groupBy (\x y -> x < y) [8,5,3,2,1,6,4,-2]
[[8],[5],[3],[2],[1,6,4],[-2]]
Do I have a wrong understanding of groupBy
?
A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups. Used to determine the groups for the groupby.
groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names. sort : Sort group keys.
It is mentioned in this tutorial that pandas groupby object is lazy. it's lazy in nature. It doesn't really do any operations to produce a useful result until you say so.
groupby() function is one of the most useful function in the library it splits the data into groups based on columns/conditions and then apply some operations eg. size() which counts the number of entries/rows in each group. The groupby() can also be applied on series.
In the implementation of the groupBy
, x
is always the first item of the sublist. Indeed, groupBy
is implemented as:
groupBy :: (a -> a -> Bool) -> [a] -> [[a]] groupBy _ [] = [] groupBy eq (x:xs) = (x:ys) : groupBy eq zs where (ys,zs) = span (eq x) xs
especially the span (eq x)
is important here, since x
will be the first item of a new group.
Since x
is thus not the previous value in the list. If we thus run groupBy
with the list [5, 3, 2, 1, 6, 4, -2]
, we get:
list | current list | x=? | check with | outcome |
---|---|---|---|---|
[5,3,2,1,6,4,-2] |
[8] |
8 |
/ |
/ |
[5,3,2,1,6,4,-2] |
[8] |
8 |
5 |
False |
[3,2,1,6,4,-2] |
[5] |
5 |
/ |
/ |
[3,2,1,6,4,-2] |
[5] |
5 |
3 |
False |
[3,2,1,6,4,-2] |
[3] |
3 |
/ |
/ |
[2,1,6,4,-2] |
[3] |
3 |
2 |
False |
[2,1,6,4,-2] |
[2] |
2 |
1 |
False |
[1,6,4,-2] |
[2] |
2 |
/ |
/ |
[1,6,4,-2] |
[2] |
2 |
1 |
False |
[6,4,-2] |
[1] |
1 |
/ |
/ |
[4,-2] |
[1,6] |
1 |
6 |
True |
[-2] |
[1,6,4] |
1 |
4 |
True |
[] |
[-2] |
-2 |
/ |
/ |
Especially the case where we compare x=1
and y=4
is important. If x
was only the previous value, we should start generating a new list, but since x
is the first item of the list, that is not the case.
Normally you should only work with an equivalence relation ~ [wiki], such relation is:
x ~ x
is true;x ~ y
if and only if y ~ x
; andx ~ y
and y ~ z
implies that x ~ z
.Your equivalence relation is not reflexive, nor is it symmetric. This is thus not a valid function to work with groupBy
.
The conceptual definition of groupBy p l
is that it yields sublists of l
such that for each xs
in l
, you have
all (==True) [p x y | x<-xs, y<-xs]
IOW, each sublist should be part of an equivalence class of p
. That notion only makes sense if p
is an equivalence relation. In particular, you need p x y ≡ p y x
, and the defining equation also assumes that p x x
is always true.
The implementation in the standard libraries shows that idea quite clearly: each x:ys
list in the result has ys
defined by the span
of elements that are equivalent to x
by the relation. So in your case, you get 1:[6,4,2]
, where 6,4,2
are all greater than 1
.
Evidently, groupBy
doesn't actually check p x y
for all pairs of elements in the result lists, so this really only makes sense if p
is indeed an equivalence relation.
What you expected the idea to be – and IMO this is not unreasonable – is that only for all x,y
such that x
is the left neighbour of y
, you want p x y
to hold. This is in general a weaker condition, but if p
is an equivalence relation then it actually implies the original condition, because such a relation also is transitive. So maybe the implementation should actually be
groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy' _ [] = []
groupBy' _ (x:l) = (x:xs) : zss
where (xs,zss) = case l of
[] -> ([],[])
zs@(y:_)
-> let ys:zss' = groupBy' p zs
in if p x y then (ys, zss')
else ([], ys:zss')
(This could be simplified a bit, but then it wouldn't be as lazy as the old implementation.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With