I look most of the previously asked questions but was not able to find answer for my question: I have following data.frame <pre class="prettyprint"><code> id year month score num_attempts 0 483625 2010 01 50 1 1 967799 2009 03 50 1 2 213473 2005 09 100 1 3 498110 2010 12 60 1 5 187243 2010 01 100 1 6 508311 2005 10 15 1 7 486688 2005 10 50 1 8 212550 2005 10 500 1 10 136701 2005 09 25 1 11 471651 2010 01 50 1 </code></pre> I want to get following data frame <pre class="prettyprint"><code>year month sum_score sum_num_attempts 2009 03 50 1 2005 09 125 2 2010 12 60 1 2010 01 200 2 2005 10 565 3 </code></pre> Here is what I tried: <pre class="prettyprint"><code>sum_df = df.groupby(by=['year','month'])['score'].sum() </code></pre> But this doesn't look efficient and correct. If I have more than one column need to be aggregate this seems like a very expensive call. for example if I have another column <code>num_attempts</code> and just want to sum by year month as score.

This should be an efficient way: <pre class="prettyprint"><code>sum_df = df.groupby(['year','month']).agg({'score': 'sum', 'num_attempts': 'sum'}) </code></pre>

Pandas: Group by two columns to get sum of another column

Tags:

group-by

I look most of the previously asked questions but was not able to find answer for my question:

I have following data.frame

           id   year month score num_attempts
0      483625  2010    01   50      1
1      967799  2009    03   50      1
2      213473  2005    09  100      1
3      498110  2010    12   60      1
5      187243  2010    01  100      1
6      508311  2005    10   15      1
7      486688  2005    10   50      1
8      212550  2005    10  500      1
10     136701  2005    09   25      1
11     471651  2010    01   50      1

I want to get following data frame

year month sum_score sum_num_attempts
2009    03   50           1
2005    09  125           2
2010    12   60           1
2010    01  200           2
2005    10  565           3

Here is what I tried:

sum_df = df.groupby(by=['year','month'])['score'].sum()

But this doesn't look efficient and correct. If I have more than one column need to be aggregate this seems like a very expensive call. for example if I have another column num_attempts and just want to sum by year month as score.

578

asked Nov 11 '16 17:11

add-semi-colons

1 Answers

This should be an efficient way:

sum_df = df.groupby(['year','month']).agg({'score': 'sum', 'num_attempts': 'sum'})

answered Sep 23 '22 09:09

Dennis Golomazov

Related questions
                            
                                ValueWarning: No frequency information was provided, so inferred frequency MS will be used
                            
                                Pandas: Get label for value in Series Object
                            
                                Python Pandas -- merging mostly duplicated rows
                            
                                Calling Python from Oracle
                            
                                Convert integer series to timedelta in pandas
                            
                                Print pandas data frame for reproducible example (equivalent to dput in R)
                            
                                pandas rolling window & datetime indexes: What does `offset` mean?
                            
                                From tuples to multiple columns in pandas
                            
                                How to read CSV file with of data frame with row names in Pandas
                            
                                How do I find 5 minutes gaps in a Pandas dataframe?
                            
                                Pandas Dataframe row selection combined condition index- and column values
                            
                                How to obtain all unique combinations of values of particular columns
                            
                                Setting DataFrame column headers to a MultiIndex
                            
                                Pandas MultiIndex: Divide all columns by one column
                            
                                Merge Only When Value is Empty/Null in Pandas
                            
                                Cyclic shift of a pandas series
                            
                                ValueError: day is out of range for month
                            
                                How to concat multiple Pandas DataFrame columns with different token separator?
                            
                                Pandas check if value in one multiindex column is in any column, same row of different multiindex
                            
                                Pandas Equivalent of R's which()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With