Pandas Normalize Using Mean Normalization To normalize all columns of pandas DataFrame, we simply subtract the mean and divide by standard deviation. This example gives unbiased estimates. Alternatively, you can also get the same using DataFrame. apply() and lambda .
I am not sure if previous versions of pandas
prevented this but now the following snippet works perfectly for me and produces exactly what you want without having to use apply
>>> import pandas as pd
>>> from sklearn.preprocessing import MinMaxScaler
>>> scaler = MinMaxScaler()
>>> dfTest = pd.DataFrame({'A':[14.00,90.20,90.95,96.27,91.21],
'B':[103.02,107.26,110.35,114.23,114.68],
'C':['big','small','big','small','small']})
>>> dfTest[['A', 'B']] = scaler.fit_transform(dfTest[['A', 'B']])
>>> dfTest
A B C
0 0.000000 0.000000 big
1 0.926219 0.363636 small
2 0.935335 0.628645 big
3 1.000000 0.961407 small
4 0.938495 1.000000 small
Like this?
dfTest = pd.DataFrame({
'A':[14.00,90.20,90.95,96.27,91.21],
'B':[103.02,107.26,110.35,114.23,114.68],
'C':['big','small','big','small','small']
})
dfTest[['A','B']] = dfTest[['A','B']].apply(
lambda x: MinMaxScaler().fit_transform(x))
dfTest
A B C
0 0.000000 0.000000 big
1 0.926219 0.363636 small
2 0.935335 0.628645 big
3 1.000000 0.961407 small
4 0.938495 1.000000 small
df = pd.DataFrame(scale.fit_transform(df.values), columns=df.columns, index=df.index)
This should work without depreciation warnings.
As it is being mentioned in pir's comment - the .apply(lambda el: scale.fit_transform(el))
method will produce the following warning:
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
Converting your columns to numpy arrays should do the job (I prefer StandardScaler):
from sklearn.preprocessing import StandardScaler
scale = StandardScaler()
dfTest[['A','B','C']] = scale.fit_transform(dfTest[['A','B','C']].as_matrix())
-- Edit Nov 2018 (Tested for pandas 0.23.4)--
As Rob Murray mentions in the comments, in the current (v0.23.4) version of pandas .as_matrix()
returns FutureWarning
. Therefore, it should be replaced by .values
:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit_transform(dfTest[['A','B']].values)
-- Edit May 2019 (Tested for pandas 0.24.2)--
As joelostblom mentions in the comments, "Since 0.24.0
, it is recommended to use .to_numpy()
instead of .values
."
Updated example:
import pandas as pd
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
dfTest = pd.DataFrame({
'A':[14.00,90.20,90.95,96.27,91.21],
'B':[103.02,107.26,110.35,114.23,114.68],
'C':['big','small','big','small','small']
})
dfTest[['A', 'B']] = scaler.fit_transform(dfTest[['A','B']].to_numpy())
dfTest
A B C
0 -1.995290 -1.571117 big
1 0.436356 -0.603995 small
2 0.460289 0.100818 big
3 0.630058 0.985826 small
4 0.468586 1.088469 small
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With