I have a pandas dataframe that looks like this
ID     country   month   revenue  profit   ebit
234    USA       201409   10        5       3
344    USA       201409    9        7       2
532    UK        201410    20       10      5
129    Canada    201411    15       10      5
I want to group by ID, country, month and count the IDs per month and country and sum the revenue, profit, ebit. The output for the above data would be:
 country   month    revenue   profit  ebit   count
   USA     201409     19        12      5      2
   UK      201409     20        10      5      1
   Canada  201411     15        10      5      1
I have tried different variations of groupby, sum and count functions of pandas but I am unable to figure out how to apply groupby sum and count all together to give the result as shown. Please share any ideas that you might have. Thanks!
Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.
Grouping by Multiple ColumnsYou can do this by passing a list of column names to groupby instead of a single string value.
groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.
Use count() by Column Namegroupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well. The below example does the grouping on Courses column and calculates count how many times each value is present.
It can be done using pivot_table this way:
>>> df1=pd.pivot_table(df, index=['country','month'],values=['revenue','profit','ebit'],aggfunc=np.sum)
>>> df1 
                ebit  profit  revenue
country month                        
Canada  201411     5      10       15
UK      201410     5      10       20
USA     201409     5      12       19
>>> df2=pd.pivot_table(df, index=['country','month'], values='ID',aggfunc=len).rename('count')
>>> df2
country  month 
Canada   201411    1
UK       201410    1
USA      201409    2
>>> pd.concat([df1,df2],axis=1)
                ebit  profit  revenue  count
country month                               
Canada  201411     5      10       15      1
UK      201410     5      10       20      1
USA     201409     5      12       19      2
UPDATE
It can be done in one-line using pivot_table and providing a dict of functions to apply to each column in the aggfunc argument:
pd.pivot_table(
   df,
   index=['country','month'],
   aggfunc={'revenue': np.sum, 'profit': np.sum, 'ebit': np.sum, 'ID': len}
).rename(columns={'ID': 'count'})
                count  ebit  profit  revenue
country month                               
Canada  201411      1     5      10       15
UK      201410      1     5      10       20
USA     201409      2     5      12       19
                        The following solution seems the simplest.
Group by country and month:
grouped_df = df.groupby(['country', 'month'])
Apply sum to columns of interest (revenue, profit, ebit):
final = grouped_df[['revenue', 'profit', 'ebit']].agg('sum')
Assign the size of the grouped_df to a new column in 'final':
final['count'] = grouped_df.size()
print(final)
Out[256]: 
                revenue  profit  ebit  count
country month                               
Canada  201411       15      10     5      1
UK      201410       20      10     5      1
USA     201409       19      12     5      2
All done!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With