I am using the following code to normalize a numeric pandas data frame.
df_norm = (input_df - input_df.mean()) / (input_df.max() - input_df.min())
Now I have a new data frame, the first two columns are string. I want to ignore the first two columns and perform normalization to the rest of the data frame. Is there a way to reuse the above code with small modification? Thanks!
You can use slice of the second column onwards -
s0 = input_df.iloc[:,2:]
input_df.iloc[:,2:] = (s0 - s0.mean()) / (s0.max() - s0.min())
Sample run -
In [274]: input_df
Out[274]:
0 1 2 3
0 foo1 doo1 0.880515 0.307642
1 foo2 doo2 0.774307 0.229650
2 foo3 doo3 0.189846 0.283218
In [275]: s0 = input_df.iloc[:,2:]
...: input_df.iloc[:,2:] = (s0 - s0.mean()) / (s0.max() - s0.min())
...:
In [276]: input_df
Out[276]:
0 1 2 3
0 foo1 doo1 0.384592 0.437719
1 foo2 doo2 0.230817 -0.562281
2 foo3 doo3 -0.615408 0.124563
Alternatively, to create a new output, we could concatenate -
ss,s0 = np.split(input_df,[2],axis=1)
df_out = pd.concat([ss,(s0 - s0.mean()) / (s0.max() - s0.min())],axis=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With