I have a pandas dataframe that looks like this
ID country month revenue profit ebit
234 USA 201409 10 5 3
344 USA 201409 9 7 2
532 UK 201410 20 10 5
129 Canada 201411 15 10 5
I want to group by ID, country, month and count the IDs per month and country and sum the revenue, profit, ebit. The output for the above data would be:
country month revenue profit ebit count
USA 201409 19 12 5 2
UK 201409 20 10 5 1
Canada 201411 15 10 5 1
I have tried different variations of groupby, sum and count functions of pandas but I am unable to figure out how to apply groupby sum and count all together to give the result as shown. Please share any ideas that you might have. Thanks!
Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.
Grouping by Multiple ColumnsYou can do this by passing a list of column names to groupby instead of a single string value.
groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.
Use count() by Column Namegroupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well. The below example does the grouping on Courses column and calculates count how many times each value is present.
It can be done using pivot_table
this way:
>>> df1=pd.pivot_table(df, index=['country','month'],values=['revenue','profit','ebit'],aggfunc=np.sum)
>>> df1
ebit profit revenue
country month
Canada 201411 5 10 15
UK 201410 5 10 20
USA 201409 5 12 19
>>> df2=pd.pivot_table(df, index=['country','month'], values='ID',aggfunc=len).rename('count')
>>> df2
country month
Canada 201411 1
UK 201410 1
USA 201409 2
>>> pd.concat([df1,df2],axis=1)
ebit profit revenue count
country month
Canada 201411 5 10 15 1
UK 201410 5 10 20 1
USA 201409 5 12 19 2
UPDATE
It can be done in one-line using pivot_table
and providing a dict of functions to apply to each column in the aggfunc
argument:
pd.pivot_table(
df,
index=['country','month'],
aggfunc={'revenue': np.sum, 'profit': np.sum, 'ebit': np.sum, 'ID': len}
).rename(columns={'ID': 'count'})
count ebit profit revenue
country month
Canada 201411 1 5 10 15
UK 201410 1 5 10 20
USA 201409 2 5 12 19
The following solution seems the simplest.
Group by country and month:
grouped_df = df.groupby(['country', 'month'])
Apply sum to columns of interest (revenue, profit, ebit):
final = grouped_df[['revenue', 'profit', 'ebit']].agg('sum')
Assign the size of the grouped_df to a new column in 'final':
final['count'] = grouped_df.size()
print(final)
Out[256]:
revenue profit ebit count
country month
Canada 201411 15 10 5 1
UK 201410 20 10 5 1
USA 201409 19 12 5 2
All done!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With