Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sum and to mean one DataFrame to create another DataFrame

After creating DataFrame with some duplicated cell values in the column Name:

import pandas as pd
df = pd.DataFrame({'Name': ['Will','John','John','John','Alex'],
                   'Payment':  [15, 10, 10, 10, 15],
                   'Duration':    [30, 15, 15, 15, 20]})

enter image description here

I would like to proceed by creating another DataFrame where the duplicated values in Name column are consolidated leaving no duplicates. At the same time I want to sum the payments values John made. I proceed with:

df_sum = df.groupby('Name', axis=0).sum().reset_index()

enter image description here

But since df.groupby('Name', axis=0).sum() command applies the sum function to every column in DataFrame the Duration (of the visit in minutes) column is processed as well. Instead I would like to get an average values for the Duration column. So I would need to use mean() method, like so:

df_mean = df.groupby('Name', axis=0).mean().reset_index()

enter image description here

But with mean() function the column Payment is now showing the average payment values John made and not the sum of all the payments.

How to create a DataFrame where Duration values show the average values while the Payment values show the sum?

like image 912
alphanumeric Avatar asked Sep 03 '16 17:09

alphanumeric


1 Answers

You can apply different functions to different columns with groupby.agg:

df.groupby('Name').agg({'Duration': 'mean', 'Payment': 'sum'})
Out: 
      Payment  Duration
Name                   
Alex       15        20
John       30        15
Will       15        30
like image 58
ayhan Avatar answered Nov 14 '22 23:11

ayhan