Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Group by two columns to get sum of another column

I look most of the previously asked questions but was not able to find answer for my question:

I have following data.frame

           id   year month score num_attempts
0      483625  2010    01   50      1
1      967799  2009    03   50      1
2      213473  2005    09  100      1
3      498110  2010    12   60      1
5      187243  2010    01  100      1
6      508311  2005    10   15      1
7      486688  2005    10   50      1
8      212550  2005    10  500      1
10     136701  2005    09   25      1
11     471651  2010    01   50      1

I want to get following data frame

year month sum_score sum_num_attempts
2009    03   50           1
2005    09  125           2
2010    12   60           1
2010    01  200           2
2005    10  565           3

Here is what I tried:

sum_df = df.groupby(by=['year','month'])['score'].sum()

But this doesn't look efficient and correct. If I have more than one column need to be aggregate this seems like a very expensive call. for example if I have another column num_attempts and just want to sum by year month as score.

like image 578
add-semi-colons Avatar asked Nov 11 '16 17:11

add-semi-colons


People also ask

How do you group by and sum multiple columns in pandas?

Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.

How do I get the sum of multiple columns in pandas?

Sum all columns in a Pandas DataFrame into new column If we want to summarize all the columns, then we can simply use the DataFrame sum() method.


1 Answers

This should be an efficient way:

sum_df = df.groupby(['year','month']).agg({'score': 'sum', 'num_attempts': 'sum'})
like image 89
Dennis Golomazov Avatar answered Sep 23 '22 09:09

Dennis Golomazov