Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply StandardScaler to parts of a data set

I want to use sklearn's StandardScaler. Is it possible to apply it to some feature columns but not others?

For instance, say my data is:

data = pd.DataFrame({'Name' : [3, 4,6], 'Age' : [18, 92,98], 'Weight' : [68, 59,49]})     Age  Name  Weight 0   18     3      68 1   92     4      59 2   98     6      49   col_names = ['Name', 'Age', 'Weight'] features = data[col_names] 

I fit and transform the data

scaler = StandardScaler().fit(features.values) features = scaler.transform(features.values) scaled_features = pd.DataFrame(features, columns = col_names)         Name       Age    Weight 0 -1.069045 -1.411004  1.202703 1 -0.267261  0.623041  0.042954 2  1.336306  0.787964 -1.245657 

But of course the names are not really integers but strings and I don't want to standardize them. How can I apply the fit and transform methods only on the columns Age and Weight?

like image 976
mitsi Avatar asked Jul 17 '16 11:07

mitsi


People also ask

How the StandardScaler () function changes the data?

StandardScaler removes the mean and scales each feature/variable to unit variance. This operation is performed feature-wise in an independent way. StandardScaler can be influenced by outliers (if they exist in the dataset) since it involves the estimation of the empirical mean and standard deviation of each feature.

When should we use StandardScaler?

StandardScaler comes into play when the characteristics of the input dataset differ greatly between their ranges, or simply when they are measured in different units of measure. StandardScaler removes the mean and scales the data to the unit variance.


1 Answers

Introduced in v0.20 is ColumnTransformer which applies transformers to a specified set of columns of an array or pandas DataFrame.

import pandas as pd data = pd.DataFrame({'Name' : [3, 4,6], 'Age' : [18, 92,98], 'Weight' : [68, 59,49]})  col_names = ['Name', 'Age', 'Weight'] features = data[col_names]  from sklearn.compose import ColumnTransformer from sklearn.preprocessing import StandardScaler  ct = ColumnTransformer([         ('somename', StandardScaler(), ['Age', 'Weight'])     ], remainder='passthrough')  ct.fit_transform(features) 

NB: Like Pipeline it also has a shorthand version make_column_transformer which doesn't require naming the transformers

Output

-1.41100443,  1.20270298,  3.         0.62304092,  0.04295368,  4.         0.78796352, -1.24565666,  6.        
like image 155
Guy C Avatar answered Sep 21 '22 20:09

Guy C