In the train function of the caret package it is possible to perform centering and scaling of predictors as in the following example:
knnFit <- train(Direction ~ ., data = training, method = "knn",
preProcess = c("center","scale"))
Setting this transformation in train should give a better evaluation of the performance of the algorithm during resampling.
In this case when I use the model to predict the response for new data should I care about centering and scaling or this operation is included in the final model?
Is the following operation sufficient?
pred <- predict(knnFit, newdata = test)
Thanks!
Yes. You need. Because your model has learned from data with a specific scale, so, it's better to convert your data to the same scale as your model works and then let it predict.
It is the most straightforward data transformation. It centers and scales a variable to mean 0 and standard deviation 1. It ensures that the criterion for finding linear combinations of the predictors is based on how much variation they explain and therefore improves the numerical stability.
In regression, it is often recommended to scale the features so that the predictors have a mean of 0. This makes it easier to interpret the intercept term as the expected value of Y when the predictor values are set to their means.
Generally, It is not necessary. Scaling inputs helps to avoid the situation, when one or several features dominate others in magnitude, as a result, the model hardly picks up the contribution of the smaller scale variables, even if they are strong.
preProces specified in the train object will be applied to the new data without preprocessing the new data first. So your operation is sufficient.
Also have a look at the extract from the caret website below. There is also a whole section purely about preprocessing. Definitely worth your time reading through it.
You can find the caret website here.
These processing steps would be applied during any predictions generated using predict.train, extractPrediction or extractProbs (see details later in this document). The pre-processing would not be applied to predictions that directly use the object$finalModel object.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With