Suppose I have the following df.
df = pd.DataFrame({
'A':['x','y','x','y'],
'B':['a','b','a','b'],
'C':[1,10,100,1000],
'D':['w','v','v','w']
})
A B C D
0 x a 1 w
1 y b 10 v
2 x a 100 v
3 y b 1000 w
I want to group by columns A and B, sum column C, and keep the value from D which is the same row of the maximum group value of C. Like this:
A B C D
x a 101 v
y b 1010 w
So far, I have this:
df.groupby(['A','B']).agg({'C':sum})
A B C
x a 101
y b 1010
What function do I have to aggregate column D with?
You can use DataFrameGroupBy.idxmax
for indices of max values of C
with loc
:
#unique index
df.reset_index(drop=True, inplace=True)
df1 = df.groupby(['A','B'])['C'].agg(['sum', 'idxmax'])
df1['idxmax'] = df.loc[df1['idxmax'], 'D'].values
df1 = df1.rename(columns={'idxmax':'D','sum':'C'}).reset_index()
Similar solution with map
:
df1 = df.groupby(['A','B'])['C'].agg(['sum', 'idxmax']).reset_index()
df1['idxmax'] = df1['idxmax'].map(df['D'])
df1 = df1.rename(columns={'idxmax':'D','sum':'C'})
print (df1)
A B C D
0 x a 101 v
1 y b 1010 w
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With