Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Get Max values within each group, by group sum

Tags:

python

pandas

import pandas as pd
import numpy as np
from datetime import datetime
df = pd.DataFrame(
    { 
        'Date' : np.random.choice(pd.date_range(datetime(2020,1,1),periods=5),20),
        'Product' : np.random.choice(['Milk','Brandy','Beer'],20)   ,    
     'Quantity' : np.random.randint(10,99,20)
        
    }  
)
df.groupby(['Date','Product']).sum()

This will give,

enter image description here

I would like to get the max values of the sum within the group what is the best way to do it?

Expected result for my random sample value would be.

enter image description here

How can I achieve this result.

like image 558
Keerikkattu Chellappan Avatar asked Jan 24 '23 16:01

Keerikkattu Chellappan


1 Answers

You can chain with another groupby, this time on your first level of your index (product) and get the max:

df.groupby(['Date','Product']).sum().groupby(level=1).max()
         Quantity
Product          
Beer          160
Brandy         97
Milk          245

To get the date as well, use sort_values with tail:

(
    df.groupby(['Date','Product']).sum()
    .sort_values('Quantity')
    .groupby(level=1)
    .tail(1)
)
        Date Product  Quantity
0 2020-01-04    Beer        81
1 2020-01-03    Milk       186
2 2020-01-03  Brandy       212
like image 197
Erfan Avatar answered Feb 12 '23 05:02

Erfan