is it necessary to center and scale data before predicting?

Tags:

r

r-caret

In the train function of the caret package it is possible to perform centering and scaling of predictors as in the following example:

knnFit <- train(Direction ~ ., data = training, method = "knn",
                preProcess = c("center","scale"))

Setting this transformation in train should give a better evaluation of the performance of the algorithm during resampling.

In this case when I use the model to predict the response for new data should I care about centering and scaling or this operation is included in the final model?

Is the following operation sufficient?

pred <- predict(knnFit, newdata = test)

Thanks!

557

asked Jan 07 '16 12:01

amarchin

1 Answers

preProces specified in the train object will be applied to the new data without preprocessing the new data first. So your operation is sufficient.

Also have a look at the extract from the caret website below. There is also a whole section purely about preprocessing. Definitely worth your time reading through it.

You can find the caret website here.

These processing steps would be applied during any predictions generated using predict.train, extractPrediction or extractProbs (see details later in this document). The pre-processing would not be applied to predictions that directly use the object$finalModel object.

165

answered Sep 19 '22 17:09

phiver

Related questions
                            
                                data.table is not handling integer64 in by statement
                            
                                data.table does not play well with checkUsage
                            
                                How to add expressions to labels in facet_wrap? [duplicate]
                            
                                Tried to guess R's HOME but no R command in the PATH. OsX 10.6
                            
                                Setting R plot xlim with only the lower bound
                            
                                How to use knitr from command line with Rscript and command line argument?
                            
                                oldLC object when creating package
                            
                                R Shiny conditionalPanel displays when condition is not met
                            
                                How to suppress zeroes when using geom_histogram with scale_y_log10
                            
                                write.xlsx function gives error when defining path with the file name but read.xlsx is fine
                            
                                dealing with an input dataset in R Shiny
                            
                                How can I get tooltips showing in dygraphs without annotation
                            
                                Create a recursive list from a list of vectors
                            
                                List all variables (and their proportions) in a subset of a dataframe
                            
                                R: can range(data.frame) exclude infinite values?
                            
                                Rstudio Git bash pop-up every time
                            
                                Conditional calculation of mean
                            
                                R - How to Speed Up Recursion and Double Summation
                            
                                ggplot2: Save individual facet_wrap facets as separate plot objects
                            
                                Collapsible tree in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With