Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is groupby from pandas commutative?

I would like to know if the rows selected by:

groupby(['a', 'b']) 

are the same as the rows selected by:

groupby(['b', 'a'])

In this case the order of the rows doesn't matter.

Is there any case in which groupby does not fulfill the commutative property?

like image 737
cristian hantig Avatar asked Dec 17 '19 13:12

cristian hantig


3 Answers

Per definition and the logic applied when using groupby in pandas, it will always be commutative:

A groupby operation involves some combination of splitting the object, applying a function, and combining the results.

This combination is linear hence commutative. The importance, is that when passing multiple by values, there will be an order in the new index values that should be kept in mind when addressing them.

From wikipedia's linear combination and commutative property:

In mathematics, a linear combination is an expression constructed from a set of terms by multiplying each term by a constant and adding the results. The idea that simple operations, such as the multiplication and addition of numbers, are commutative was for many years implicitly assumed.

like image 120
Celius Stingher Avatar answered Oct 13 '22 01:10

Celius Stingher


I think order for counts not matter, only after groupby get first columns/ levels in order like you have columns in list.

df = pd.DataFrame({
        'a':list('aaaaaa'),
         'b':[4,5,4,5,5,4],
         'c':[7,8,9,4,2,3],

})

Order of levels after groupby aggregation:

df1 = df.groupby(['a', 'b']).sum()
print (df1)
      c
a b    
a 4  19
  5  14

df2 = df.groupby(['b', 'a']).sum()
print (df2)
      c
b a    
4 a  19
5 a  14

And columns:

df3 = df.groupby(['a', 'b'], as_index=False).sum()
print (df3)
   a  b   c
0  a  4  19
1  a  5  14

df4 = df.groupby(['b', 'a'], as_index=False).sum()
print (df4)
   b  a   c
0  4  a  19
1  5  a  14

If use transormation for new column with same size like original result is same:

df['new1'] = df.groupby(['a', 'b'])['c'].transform('sum')
df['new2'] = df.groupby(['b', 'a'])['c'].transform('sum')
print (df)
   a  b  c  new1  new2
0  a  4  7    19    19
1  a  5  8    14    14
2  a  4  9    19    19
3  a  5  4    14    14
4  a  5  2    14    14
5  a  4  3    19    19
like image 22
jezrael Avatar answered Oct 13 '22 00:10

jezrael


Yes, the final groups will always be the same.

Only difference is the order in which rows will be showed.

like image 28
manuhortet Avatar answered Oct 12 '22 23:10

manuhortet