Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Normalize rows of pandas data frame by their sums [duplicate]

Tags:

python

pandas

I have a pandas dataframe containing spectral data and metadata. The columns are labeled with a multiindex so that df['wvl'] gives the spectra and df['meta'] gives the metadata. Within df['wvl'] the column labels are the wavelength values for the spectrometer channels.

What I want to do is normalize each row of df['wvl'] by the sum of that row so that adding up the values in the row gives a total of 1.0.

Here's what one row of the dataframe looks like:

df['wvl'].iloc[0]
246.050003     128.533035
246.102005     102.756321
246.156006      99.930775
...    
848.697205     121.313347
848.896423     127.011662
849.095703     123.234168
Name: 0, dtype: float64

But when I do something like:

df['wvl'].iloc[0]=df['wvl'].iloc[0]/df['wvl'].iloc[0].sum()

Nothing happens! I get the exact same values:

df['wvl'].iloc[0]
246.050003     128.533035
246.102005     102.756321
246.156006      99.930775
...    
848.697205     121.313347
848.896423     127.011662
849.095703     123.234168
Name: 0, dtype: float64

If I create a temporary variable to hold the row, I can do the normalization just fine:

temp=df['wvl'].iloc[0]

temp=temp/temp.sum()

temp
246.050003    0.000027
246.102005    0.000022
246.156006    0.000021
                ...   
848.697205    0.000026
848.896423    0.000027
849.095703    0.000026
Name: 0, dtype: float64

But if I try to replace the dataframe row with the normalized temporary variable, nothing happens:

df['wvl'].iloc[0]=temp

df['wvl'].iloc[0]
246.050003     128.533035
246.102005     102.756321
246.156006      99.930775
                 ...     
848.697205     121.313347
848.896423     127.011662
849.095703     123.234168
Name: 0, dtype: float64

I'm obviously missing something here, but I can't figure out what and it's driving me insane. Help? Thanks in advance!

like image 756
rba Avatar asked Feb 28 '16 04:02

rba


1 Answers

You can use

df.div(df.sum(axis=1), axis=0)

df.sum(axis=1) sums up each row; df.div(..., axis=0) then divides.

Example:

import pandas as pd

df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
>>> df.div(df.sum(axis=1), axis=0)
    a   b
0   0.250000    0.750000
1   0.333333    0.666667
like image 194
Ami Tavory Avatar answered Oct 15 '22 06:10

Ami Tavory