Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Standardize some columns in Python Pandas dataframe?

Python code below only return me an array, but I want the scaled data to replace the original data.

from sklearn.preprocessing import StandardScaler
df = StandardScaler().fit_transform(df[['cost', 'sales']])
df

output

array([[ 1.99987622, -0.55900276],
       [-0.49786658, -0.45658181],
       [-0.5146864 , -0.505097  ],
       [-0.48104676, -0.47814412],
       [-0.50627649,  1.9988257 ]])

original data

id  cost    sales   item
1   300       50    pen
2   3         88    bottle
3   1         70    drink
4   5         80    cup
5   2        999    ink
like image 831
BigData Avatar asked Apr 04 '18 02:04

BigData


2 Answers

Simply assign it back

df[['cost', 'sales']] = StandardScaler().fit_transform(df[['cost', 'sales']])
df
Out[45]: 
   id      cost     sales    item
0   1  1.999876 -0.559003     pen
1   2 -0.497867 -0.456582  bottle
2   3 -0.514686 -0.505097   drink
3   4 -0.481047 -0.478144     cup
4   5 -0.506276  1.998826     ink
like image 129
BENY Avatar answered Oct 21 '22 00:10

BENY


Or in case the column index is used instead of the column names:

import pandas as pd
from sklearn.preprocessing import StandardScaler
df = pd.DataFrame({"cost": [300,3,1,5,2], "sales": [50,88,70,80,999], "item": ["pen","bottle","drink","cup","ink"]})

# Scale selected columns by index
df.iloc[:, 0:2] = StandardScaler().fit_transform(df.iloc[:, 0:2])

       cost     sales    item
0  1.999876 -0.559003     pen
1 -0.497867 -0.456582  bottle
2 -0.514686 -0.505097   drink
3 -0.481047 -0.478144     cup
4 -0.506276  1.998826     ink

The sclaer object can also be saved so to scale "new data" based on the existing scaler:

df = pd.DataFrame({"cost": [300,3,1,5,2], "sales": [50,88,70,80,999], "item": ["pen","bottle","drink","cup","ink"]})
df_new = pd.DataFrame({"cost": [299,5,12,64,2], "sales": [55,99,48,20,999], "item": ["pen","bottle","drink","cup","ink"]})

# Set up scaler
scaler = StandardScaler().fit(df.iloc[:, 0:2])

# Scale original data
df.iloc[:, 0:2] = scaler.transform(df.iloc[:, 0:2])

# Scale new data 
df_new.iloc[:, 0:2] = scaler.transform(df_new.iloc[:, 0:2])
like image 22
Peter Avatar answered Oct 21 '22 01:10

Peter