Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas: mean and sum groupby on different columns at the same time

Tags:

python

pandas

I have a pandas dataframe which looks like the following:

Name    Missed    Credit    Grade A       1         3         10 A       1         1         12       B       2         3         10 B       1         2         20 

And my desired output is:

Name    Sum1   Sum2    Average A       2      4      11 B       3      5      15    

Basically to get the sum of column Credit and Missed and to do average on Grade. What I am doing right now is two groupby on Name and then get sum and average and finally merge the two output dataframes which does not seem to be the best way of doing this. I have also found this on SO which makes sense if I want to work only on one column:

df.groupby('Name')['Credit'].agg(['sum','average']) 

But not sure how to do a one-liner for both columns?

like image 317
ahajib Avatar asked Feb 21 '18 15:02

ahajib


People also ask

How do you group by and sum multiple columns in pandas?

Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.

Can you use Groupby with multiple columns in pandas?

How to groupby multiple columns in pandas DataFrame and compute multiple aggregations? groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.

How do I sum multiple columns in pandas DataFrame?

Sum all columns in a Pandas DataFrame into new column If we want to summarize all the columns, then we can simply use the DataFrame sum() method.


2 Answers

You need agg by dictionary and then rename columns names:

d = {'Missed':'Sum1', 'Credit':'Sum2','Grade':'Average'} df=df.groupby('Name').agg({'Missed':'sum', 'Credit':'sum','Grade':'mean'}).rename(columns=d) print (df)       Sum1  Sum2  Average Name                      A        2     4       11 B        3     5       15 

If want also create column from Name:

df = (df.groupby('Name', as_index=False)        .agg({'Missed':'sum', 'Credit':'sum','Grade':'mean'})        .rename(columns={'Missed':'Sum1', 'Credit':'Sum2','Grade':'Average'})) print (df)   Name  Sum1  Sum2  Average 0    A     2     4       11 1    B     3     5       15 

Solution with named aggregations:

df = df.groupby('Name', as_index=False).agg(Sum1=('Missed','sum'),                                              Sum2= ('Credit','sum'),                                             Average=('Grade','mean')) print (df)   Name  Sum1  Sum2  Average 0    A     2     4       11 1    B     3     5       15 
like image 77
jezrael Avatar answered Oct 02 '22 02:10

jezrael


A = pd.DataFrame.from_dict({'Name':['A','A','B','B'],'Missed':[1,1,2,1],'Credit':[3,1,3,2],'Grades':[10,12,10,20]})  A.groupby('Name').agg({'Missed':'sum','Credit':'sum','Grades':'mean'}) 
like image 26
ashish trehan Avatar answered Oct 02 '22 01:10

ashish trehan