Given a dataframe df1 as follows :
Col1 Col2 Col3 Col4 Col5
-------------------------------------
A 1 AA 10 Test1
A 1 AA 5 Test2
A 2 AB 30 Test3
B 4 FF 10 Test4
C 1 HH 4 Test7
C 3 GG 6 Test8
C 3 GG 7 Test9
D 1 AA 4 Test5
D 3 FF 6 Test6
I want to group by Col1, Col2 and Col3 and
Add new column Count : size of each group
Add new column Col4_sum : sum of each Col4 in each group
Output need
Col1 Col2 Col3 Count Col4_sum
----------------------------------------
A 1 AA 2 15
A 2 AB 1 30
B 4 FF 1 10
C 1 HH 1 4
C 3 GG 2 13
D 1 AA 1 4
D 3 FF 1 6
I try to use
df1.groupby(['Col1','Col2','Col3']).size
but get only Count column.
Use GroupBy.agg
with tuples for specify aggregate function with new columns names:
df = (df1.groupby(['Col1','Col2','Col3'])['Col4']
.agg([('Count','size'), ('Col4_sum','sum')])
.reset_index())
print (df)
Col1 Col2 Col3 Count Col4_sum
0 A 1 AA 2 15
1 A 2 AB 1 30
2 B 4 FF 1 10
3 C 1 HH 1 4
4 C 3 GG 2 13
5 D 1 AA 1 4
6 D 3 FF 1 6
In pandas 0.25+ is possible use named aggregation
:
df = (df1.groupby(['Col1','Col2','Col3'])
.agg(Count=('Col5', 'size'), Col4_sum=('Col4', 'sum'))
.reset_index())
print (df)
Col1 Col2 Col3 Count Col4_sum
0 A 1 AA 2 15
1 A 2 AB 1 30
2 B 4 FF 1 10
3 C 1 HH 1 4
4 C 3 GG 2 13
5 D 1 AA 1 4
6 D 3 FF 1 6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With