Is there something similar in R
that allows to fit a StandardScaler
(resulting into mean=0 and standard deviation=1 features) to the training data and use that scaler model to transform the test data? scale
does not offer a way to transform test-data based on the mean and standard deviation from the training data.
Snippet for Python
:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)
Since I'm pretty sure that this is the right way to do so (avoiding the leak of information from the test to the training set) I guess there is a simple solution I'm just unable to find.
StandardScaler is useful for the features that follow a Normal distribution. Therefore, it makes mean = 0 and scales the data to unit variance. MinMaxScaler may be used when the upper and lower boundaries are well known from domain knowledge.
StandardScaler is the industry's go-to algorithm. 🙂 StandardScaler standardizes a feature by subtracting the mean and then scaling to unit variance. Unit variance means dividing all the values by the standard deviation.
StandardScaler follows Standard Normal Distribution (SND). Therefore, it makes mean = 0 and scales the data to unit variance. MinMaxScaler scales all the data features in the range [0, 1] or else in the range [-1, 1] if there are negative values in the dataset.
I believe that the scale
function in R
does what you are looking for. For your example, that would just be
X_train_scaled = scale(X_train)
Then, you can apply the mean and sd from the scaled training set to your test set using the attr
(attributes) from your scaled X_train:
X_test_scaled = scale(X_test, center=attr(X_train_scaled, "scaled:center"),
scale=attr(X_train_scaled, "scaled:scale"))
This obtains the exact results as the transformations from the example that you posted
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With