Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Divide Column in Pandas Dataframe by Sum of Column

I have a dataframe where I would like to divide each row within column A by the sum of column A and make that a new column within the dataframe.

Example:

        Col A   New Col
        2       .22
        3       .33
        4       .44
Total = 9       1.00

I tried to sum Col A and then tried to divide by 'Total' but because Total is not a column but a row, it did not work. I just get NaN for each row within the new column.

df['New Col']= (df['ColA']/df.loc['Total']) 

I know you can also probably integrate a sum calculation within the one line of code instead of creating a totals row as well but not sure how to do that and could not find anything online.

df['New Col']= (df['ColA']/df.sum()) 

Ideas?

like image 589
spacedinosaur10 Avatar asked Dec 02 '16 20:12

spacedinosaur10


3 Answers

df['new'] = df['ColA'] /  df['ColA'].sum()

should work

like image 105
Steven G Avatar answered Oct 03 '22 11:10

Steven G


Another approach is to use transform:

df['New Col'] = df['ColA'].transform(lambda x: x / x.sum())
like image 35
Clade Avatar answered Oct 03 '22 10:10

Clade


You are very close. You want to perform the sum() on the Col A series

df['New Col'] = df['Col A']/df['Col A'].sum()

Results in a dataframe that looks like this:

>>> df
   Col A   New Col
0      2  0.222222
1      3  0.333333
2      4  0.444444

Now if you do df.sum() you get a Series with the totals per column:

>>> df.sum()
Col A      9.0
New Col    1.0
dtype: float64
like image 39
Andy Avatar answered Oct 03 '22 11:10

Andy