Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trainable sklearn StandardScaler for R

Is there something similar in R that allows to fit a StandardScaler (resulting into mean=0 and standard deviation=1 features) to the training data and use that scaler model to transform the test data? scale does not offer a way to transform test-data based on the mean and standard deviation from the training data.

Snippet for Python:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)

X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

Since I'm pretty sure that this is the right way to do so (avoiding the leak of information from the test to the training set) I guess there is a simple solution I'm just unable to find.

like image 612
Boern Avatar asked Mar 13 '18 16:03

Boern


People also ask

Should I use MinMaxScaler or StandardScaler?

StandardScaler is useful for the features that follow a Normal distribution. Therefore, it makes mean = 0 and scales the data to unit variance. MinMaxScaler may be used when the upper and lower boundaries are well known from domain knowledge.

What is StandardScaler in Scikit learn?

StandardScaler is the industry's go-to algorithm. 🙂 StandardScaler standardizes a feature by subtracting the mean and then scaling to unit variance. Unit variance means dividing all the values by the standard deviation.

What is the difference between MIN MAX scaler and StandardScaler in Scikit learn?

StandardScaler follows Standard Normal Distribution (SND). Therefore, it makes mean = 0 and scales the data to unit variance. MinMaxScaler scales all the data features in the range [0, 1] or else in the range [-1, 1] if there are negative values in the dataset.


1 Answers

I believe that the scale function in R does what you are looking for. For your example, that would just be

X_train_scaled = scale(X_train)

Then, you can apply the mean and sd from the scaled training set to your test set using the attr (attributes) from your scaled X_train:

X_test_scaled = scale(X_test, center=attr(X_train_scaled, "scaled:center"), 
                              scale=attr(X_train_scaled, "scaled:scale"))

This obtains the exact results as the transformations from the example that you posted

like image 133
sacuL Avatar answered Nov 15 '22 13:11

sacuL