I have a pandas dataframe that looks like this <pre class="prettyprint"><code>ID country month revenue profit ebit 234 USA 201409 10 5 3 344 USA 201409 9 7 2 532 UK 201410 20 10 5 129 Canada 201411 15 10 5 </code></pre> I want to group by ID, country, month and count the IDs per month and country and sum the revenue, profit, ebit. The output for the above data would be: <pre class="prettyprint"><code> country month revenue profit ebit count USA 201409 19 12 5 2 UK 201409 20 10 5 1 Canada 201411 15 10 5 1 </code></pre> I have tried different variations of groupby, sum and count functions of pandas but I am unable to figure out how to apply groupby sum and count all together to give the result as shown. Please share any ideas that you might have. Thanks!

The following solution seems the simplest. Group by country and month: <pre class="prettyprint"><code>grouped_df = df.groupby(['country', 'month']) </code></pre> Apply sum to columns of interest (revenue, profit, ebit): <pre class="prettyprint"><code>final = grouped_df[['revenue', 'profit', 'ebit']].agg('sum') </code></pre> Assign the size of the grouped_df to a new column in 'final': <pre class="prettyprint"><code>final['count'] = grouped_df.size() print(final) Out[256]: revenue profit ebit count country month Canada 201411 15 10 5 1 UK 201410 20 10 5 1 USA 201409 19 12 5 2 </code></pre> All done!

Groupby sum and count on multiple columns in python

Tags:

python

python-3.x

pandas

python-2.7

pandas-groupby

I have a pandas dataframe that looks like this

ID     country   month   revenue  profit   ebit
234    USA       201409   10        5       3
344    USA       201409    9        7       2
532    UK        201410    20       10      5
129    Canada    201411    15       10      5

I want to group by ID, country, month and count the IDs per month and country and sum the revenue, profit, ebit. The output for the above data would be:

 country   month    revenue   profit  ebit   count
   USA     201409     19        12      5      2
   UK      201409     20        10      5      1
   Canada  201411     15        10      5      1

I have tried different variations of groupby, sum and count functions of pandas but I am unable to figure out how to apply groupby sum and count all together to give the result as shown. Please share any ideas that you might have. Thanks!

765

asked Feb 13 '18 14:02

N91

2 Answers

It can be done using pivot_table this way:

>>> df1=pd.pivot_table(df, index=['country','month'],values=['revenue','profit','ebit'],aggfunc=np.sum)
>>> df1 
                ebit  profit  revenue
country month                        
Canada  201411     5      10       15
UK      201410     5      10       20
USA     201409     5      12       19

>>> df2=pd.pivot_table(df, index=['country','month'], values='ID',aggfunc=len).rename('count')
>>> df2

country  month 
Canada   201411    1
UK       201410    1
USA      201409    2

>>> pd.concat([df1,df2],axis=1)

                ebit  profit  revenue  count
country month                               
Canada  201411     5      10       15      1
UK      201410     5      10       20      1
USA     201409     5      12       19      2

UPDATE

It can be done in one-line using pivot_table and providing a dict of functions to apply to each column in the aggfunc argument:

pd.pivot_table(
   df,
   index=['country','month'],
   aggfunc={'revenue': np.sum, 'profit': np.sum, 'ebit': np.sum, 'ID': len}
).rename(columns={'ID': 'count'})

                count  ebit  profit  revenue
country month                               
Canada  201411      1     5      10       15
UK      201410      1     5      10       20
USA     201409      2     5      12       19

128

answered Sep 23 '22 17:09

Mabel Villalba

The following solution seems the simplest.

Group by country and month:

grouped_df = df.groupby(['country', 'month'])

Apply sum to columns of interest (revenue, profit, ebit):

final = grouped_df[['revenue', 'profit', 'ebit']].agg('sum')

Assign the size of the grouped_df to a new column in 'final':

final['count'] = grouped_df.size()
print(final)

Out[256]: 
                revenue  profit  ebit  count
country month                               
Canada  201411       15      10     5      1
UK      201410       20      10     5      1
USA     201409       19      12     5      2

All done!

answered Sep 23 '22 17:09

matpav

Related questions
                            
                                How to find the sum of all the multiples of 3 or 5 below 1000 in Python?
                            
                                Convert array of string (category) to array of int from a pandas dataframe
                            
                                pybrain: how to print a network (nodes and weights)
                            
                                Python regex:combining re pattern format with a variable
                            
                                A simple website with python using SimpleHTTPServer and SocketServer, how to only display the html file and not the whole directory?
                            
                                How to get the list of error numbers (Errno) for an Exception type in python?
                            
                                Python.h header file missing on Mac OS X 10.6
                            
                                Python regular expression match multiple words anywhere
                            
                                Dynamically update attributes of an object that depend on the state of other attributes of same object
                            
                                ImportError No module named pyaudio
                            
                                Pip install: can't open file pip, or Parent module '' not loaded
                            
                                How to create Pandas groupby plot with subplots
                            
                                expect in python3 is throwing error as "must be in str , not bytes"
                            
                                how to export to pdf a graph based on a pandas dataframe?
                            
                                How to print this pattern? I cannot get the logic for eliminating the middle part
                            
                                How can I generate a regular geographic grid using python?
                            
                                Add values from two dictionaries
                            
                                ImportError: No module named 'nets'
                            
                                'S3' object has no attribute 'Bucket'
                            
                                How to add Dropout in Keras functional model?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With