Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

normalize a pandas data frame but skip a few columns

I am using the following code to normalize a numeric pandas data frame.

df_norm = (input_df - input_df.mean()) / (input_df.max() - input_df.min())

Now I have a new data frame, the first two columns are string. I want to ignore the first two columns and perform normalization to the rest of the data frame. Is there a way to reuse the above code with small modification? Thanks!

like image 320
Edamame Avatar asked Mar 08 '23 23:03

Edamame


1 Answers

You can use slice of the second column onwards -

s0 = input_df.iloc[:,2:]
input_df.iloc[:,2:] = (s0 - s0.mean()) / (s0.max() - s0.min())

Sample run -

In [274]: input_df
Out[274]: 
      0     1         2         3
0  foo1  doo1  0.880515  0.307642
1  foo2  doo2  0.774307  0.229650
2  foo3  doo3  0.189846  0.283218

In [275]: s0 = input_df.iloc[:,2:]
     ...: input_df.iloc[:,2:] = (s0 - s0.mean()) / (s0.max() - s0.min())
     ...: 

In [276]: input_df
Out[276]: 
      0     1         2         3
0  foo1  doo1  0.384592  0.437719
1  foo2  doo2  0.230817 -0.562281
2  foo3  doo3 -0.615408  0.124563

Alternatively, to create a new output, we could concatenate -

ss,s0 = np.split(input_df,[2],axis=1)
df_out = pd.concat([ss,(s0 - s0.mean()) / (s0.max() - s0.min())],axis=1)
like image 93
Divakar Avatar answered Apr 27 '23 18:04

Divakar