Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

is it necessary to center and scale data before predicting?

Tags:

r

r-caret

In the train function of the caret package it is possible to perform centering and scaling of predictors as in the following example:

knnFit <- train(Direction ~ ., data = training, method = "knn",
                preProcess = c("center","scale"))

Setting this transformation in train should give a better evaluation of the performance of the algorithm during resampling.

In this case when I use the model to predict the response for new data should I care about centering and scaling or this operation is included in the final model?

Is the following operation sufficient?

pred <- predict(knnFit, newdata = test)

Thanks!

like image 557
amarchin Avatar asked Jan 07 '16 12:01

amarchin


People also ask

Do we need to scale data for prediction?

Yes. You need. Because your model has learned from data with a specific scale, so, it's better to convert your data to the same scale as your model works and then let it predict.

Why is centering and scaling data important?

It is the most straightforward data transformation. It centers and scales a variable to mean 0 and standard deviation 1. It ensures that the criterion for finding linear combinations of the predictors is based on how much variation they explain and therefore improves the numerical stability.

Should you scale data before regression?

In regression, it is often recommended to scale the features so that the predictors have a mean of 0. This makes it easier to interpret the intercept term as the expected value of Y when the predictor values are set to their means.

Is it necessary to scale the target value?

Generally, It is not necessary. Scaling inputs helps to avoid the situation, when one or several features dominate others in magnitude, as a result, the model hardly picks up the contribution of the smaller scale variables, even if they are strong.


1 Answers

preProces specified in the train object will be applied to the new data without preprocessing the new data first. So your operation is sufficient.

Also have a look at the extract from the caret website below. There is also a whole section purely about preprocessing. Definitely worth your time reading through it.

You can find the caret website here.

These processing steps would be applied during any predictions generated using predict.train, extractPrediction or extractProbs (see details later in this document). The pre-processing would not be applied to predictions that directly use the object$finalModel object.

like image 165
phiver Avatar answered Sep 19 '22 17:09

phiver