After creating DataFrame with some duplicated cell values in the column Name:
import pandas as pd
df = pd.DataFrame({'Name': ['Will','John','John','John','Alex'],
                   'Payment':  [15, 10, 10, 10, 15],
                   'Duration':    [30, 15, 15, 15, 20]})

I would like to proceed by creating another DataFrame where the duplicated values in Name column are consolidated leaving no duplicates. At the same time I want to sum the payments values John made. I proceed with:
df_sum = df.groupby('Name', axis=0).sum().reset_index()

But since df.groupby('Name', axis=0).sum() command applies the sum function to every column in DataFrame the Duration (of the visit in minutes) column is processed as well. Instead I would like to get an average values for the Duration column. So I would need to use mean() method, like so:
df_mean = df.groupby('Name', axis=0).mean().reset_index()

But with mean() function the column Payment is now showing the average payment values John made and not the sum of all the payments. 
How to create a DataFrame where Duration values show the average values while the Payment values show the sum?
You can apply different functions to different columns with groupby.agg:
df.groupby('Name').agg({'Duration': 'mean', 'Payment': 'sum'})
Out: 
      Payment  Duration
Name                   
Alex       15        20
John       30        15
Will       15        30
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With