pandas dataframe columns scaling with sklearn

People also ask

How do I normalize a column in Pandas Dataframe?

Pandas Normalize Using Mean Normalization To normalize all columns of pandas DataFrame, we simply subtract the mean and divide by standard deviation. This example gives unbiased estimates. Alternatively, you can also get the same using DataFrame. apply() and lambda .

I am not sure if previous versions of pandas prevented this but now the following snippet works perfectly for me and produces exactly what you want without having to use apply

>>> import pandas as pd
>>> from sklearn.preprocessing import MinMaxScaler


>>> scaler = MinMaxScaler()

>>> dfTest = pd.DataFrame({'A':[14.00,90.20,90.95,96.27,91.21],
                           'B':[103.02,107.26,110.35,114.23,114.68],
                           'C':['big','small','big','small','small']})

>>> dfTest[['A', 'B']] = scaler.fit_transform(dfTest[['A', 'B']])

>>> dfTest
          A         B      C
0  0.000000  0.000000    big
1  0.926219  0.363636  small
2  0.935335  0.628645    big
3  1.000000  0.961407  small
4  0.938495  1.000000  small

Like this?

dfTest = pd.DataFrame({
           'A':[14.00,90.20,90.95,96.27,91.21],
           'B':[103.02,107.26,110.35,114.23,114.68], 
           'C':['big','small','big','small','small']
         })
dfTest[['A','B']] = dfTest[['A','B']].apply(
                           lambda x: MinMaxScaler().fit_transform(x))
dfTest

    A           B           C
0   0.000000    0.000000    big
1   0.926219    0.363636    small
2   0.935335    0.628645    big
3   1.000000    0.961407    small
4   0.938495    1.000000    small

df = pd.DataFrame(scale.fit_transform(df.values), columns=df.columns, index=df.index)

This should work without depreciation warnings.

As it is being mentioned in pir's comment - the .apply(lambda el: scale.fit_transform(el)) method will produce the following warning:

DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.

Converting your columns to numpy arrays should do the job (I prefer StandardScaler):

from sklearn.preprocessing import StandardScaler
scale = StandardScaler()

dfTest[['A','B','C']] = scale.fit_transform(dfTest[['A','B','C']].as_matrix())

-- Edit Nov 2018 (Tested for pandas 0.23.4)--

As Rob Murray mentions in the comments, in the current (v0.23.4) version of pandas .as_matrix() returns FutureWarning. Therefore, it should be replaced by .values:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

scaler.fit_transform(dfTest[['A','B']].values)

-- Edit May 2019 (Tested for pandas 0.24.2)--

As joelostblom mentions in the comments, "Since 0.24.0, it is recommended to use .to_numpy() instead of .values."

Updated example:

import pandas as pd
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
dfTest = pd.DataFrame({
               'A':[14.00,90.20,90.95,96.27,91.21],
               'B':[103.02,107.26,110.35,114.23,114.68],
               'C':['big','small','big','small','small']
             })
dfTest[['A', 'B']] = scaler.fit_transform(dfTest[['A','B']].to_numpy())
dfTest
      A         B      C
0 -1.995290 -1.571117    big
1  0.436356 -0.603995  small
2  0.460289  0.100818    big
3  0.630058  0.985826  small
4  0.468586  1.088469  small

Related questions
                            
                                How to delete a record by id in Flask-SQLAlchemy
                            
                                Reload django object from database
                            
                                How do I abort the execution of a Python script? [duplicate]
                            
                                How do I tell matplotlib that I am done with a plot?
                            
                                On localhost, how do I pick a free port number?
                            
                                How to get a random value from dictionary?
                            
                                check if variable is dataframe
                            
                                Putting an if-elif-else statement on one line?
                            
                                Why is semicolon allowed in this python snippet?
                            
                                How to loop through all but the last item of a list?
                            
                                Creating a simple XML file using python
                            
                                Python Remove last 3 characters of a string
                            
                                How do you catch this exception?
                            
                                Find column whose name contains a specific string
                            
                                Run certain code every n seconds [duplicate]
                            
                                pythonic way to do something N times without an index variable?
                            
                                Unable to import a module that is definitely installed
                            
                                How to update a plot in matplotlib?
                            
                                How do I upgrade the Python installation in Windows 10?
                            
                                Does SQLAlchemy have an equivalent of Django's get_or_create?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas dataframe columns scaling with sklearn

Tags:

python

pandas

dataframe

scikit-learn

People also ask

Recent Activity

Donate For Us