Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group by Sum as new column name

Tags:

python

pandas

I am doing function where I am grouping by ID and summing the $ value associated with those IDs with this code for python:

df = df.groupby([' Id'], as_index=False, sort=False)[["Amount"]].sum();

but it doesnt rename the column. As such I tried doing this :

`df = df.groupby([' Id'], as_index=False, sort=False)`[["Amount"]].sum();.reset_index(name ='Total Amount')

but it gave me error that TypeError: reset_index() got an unexpected keyword argument 'name'

So I tried doing this finally following this post:Python Pandas Create New Column with Groupby().Sum()

df = df.groupby(['Id'])[["Amount"]].transform('sum'); 

but it still didnt work.

What am I doing wrong?

like image 850
Adam Avatar asked Jul 16 '17 04:07

Adam


People also ask

How do I change the column name after Groupby in pandas?

One way of renaming the columns in a Pandas Dataframe is by using the rename() function.

How do I create a new column from the output of pandas Groupby () sum ()?

To create a new column for the output of groupby. sum(), we will first apply the groupby. sim() operation and then we will store this result in a new column.

How do I get a new name to a specific column?

Select a column, and then select Transform > Rename. You can also double-click the column header. Enter the new name.

How do you sum by Groupby?

Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.


2 Answers

I think you need remove parameter as_index=False and use Series.reset_index, because this parameter return df and then DataFrame.reset_index with parameter name failed:

df = df.groupby('Id', sort=False)["Amount"].sum().reset_index(name ='Total Amount')

Or rename column first:

d = {'Amount':'Total Amount'}
df = df.rename(columns=d).groupby('Id', sort=False, as_index=False)["Total Amount"].sum()

Sample:

df = pd.DataFrame({'Id':[1,2,2],'Amount':[10, 30,50]})
print (df)
   Amount  Id
0      10   1
1      30   2
2      50   2

df1 = df.groupby('Id', sort=False)["Amount"].sum().reset_index(name ='Total Amount')
print (df1)
   Id  Total Amount
0   1            10
1   2            80

d = {'Amount':'Total Amount'}
df1 = df.rename(columns=d).groupby('Id', sort=False, as_index=False)["Total Amount"].sum()
print (df1)
   Id  Total Amount
0   1            10
1   2            80

But if need new column with sum in original df use transform and assign output to new column:

df['Total Amount'] = df.groupby('Id', sort=False)["Amount"].transform('sum')
print (df)
   Amount  Id  Total Amount
0      10   1            10
1      30   2            80
2      50   2            80
like image 127
jezrael Avatar answered Oct 23 '22 17:10

jezrael


import pandas as pd

# set up dataframe
df = pd.DataFrame({'colA':['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd'], 
                   'colB':['cat', 'cat', 'dog', 'cat', 'dog', 'cat', 'cat', 'dog'],
                   'colC':[1,2,3,4,4,5,6,7], })

print(df)

  colA colB  colC
0    a  cat     1
1    a  cat     2
2    a  dog     3
3    b  cat     4
4    b  dog     4
5    c  cat     5
6    c  cat     6
7    d  dog     7 



# group on vals in column A
# get min (within groups) for column B 
# get avg (within groups) for column C
df_agg = ( df.groupby(by=['colA'])
          .agg({'colB':'min', 'colC':'mean'})
          .rename(columns={'colB':'colB_grp_min', 'colC':'colC_grp_avg'})
          )

print(df_agg)

     min_colB  avg_colC
colA                   
a         cat       2.0
b         cat       4.0
c         cat       5.5
d         dog       7.0



# if you want multiple aggregations on the same column, pass a list
#   this will return a multiindex
# group on vals in column A
# get min (within groups) for column B 
# get avg and max (within groups) for column C
df_agg2 = ( df.groupby(by=['colA'])
          .agg({'colB':'min', 'colC':['mean', 'max']})
          .rename(columns={'colB':'colB_grp_min', 'colC':'colC_grp_multi_index'})
          )
print(df_agg2)

     colB_grp_min colC_grp_multi_index    
              min                 mean max
colA                                      
a             cat                  2.0   3
b             cat                  4.0   4
c             cat                  5.5   6
d             dog                  7.0   7

like image 30
Data-phile Avatar answered Oct 23 '22 17:10

Data-phile