Let me start by saying that I have read many posts on Cross Validation and it seems there is much confusion out there. My understanding of that it is simply this: <ol> <li>Perform k-fold Cross Validation i.e. 10 folds to understand the average error across the 10 folds.</li> <li>If acceptable then train the model on the complete data set.</li> </ol> I am attempting to build a decision tree using <code>rpart</code> in R and taking advantage of the <code>caret</code> package. Below is the code I am using. <pre class="prettyprint"><code># load libraries library(caret) library(rpart) # define training control train_control<- trainControl(method="cv", number=10) # train the model model<- train(resp~., data=mydat, trControl=train_control, method="rpart") # make predictions predictions<- predict(model,mydat) # append predictions mydat<- cbind(mydat,predictions) # summarize results confusionMatrix<- confusionMatrix(mydat$predictions,mydat$resp) </code></pre> I have one question regarding the caret train application. I have read A Short Introduction to the caret Package train section which states during the resampling process the "optimal parameter set" is determined. In my example have I coded it up correctly? Do I need to define the <code>rpart</code> parameters within my code or is my code sufficient?

when you perform k-fold cross validation you are already making a prediction for each sample, just over 10 different models (presuming k = 10). There is no need make a prediction on the complete data, as you already have their predictions from the k different models. What you can do is the following: <pre class="prettyprint"><code>train_control<- trainControl(method="cv", number=10, savePredictions = TRUE) </code></pre> Then <pre class="prettyprint"><code>model<- train(resp~., data=mydat, trControl=train_control, method="rpart") </code></pre> if you want to see the observed and predictions in a nice format you simply type: <pre class="prettyprint"><code>model$pred </code></pre> Also for the second part of your question, caret should handle all the parameter stuff. You can manually try tune parameters if you desire.

Applying k-fold Cross Validation model using caret package

Tags:

r

r-caret

cross-validation

rpart

Let me start by saying that I have read many posts on Cross Validation and it seems there is much confusion out there. My understanding of that it is simply this:

Perform k-fold Cross Validation i.e. 10 folds to understand the average error across the 10 folds.
If acceptable then train the model on the complete data set.

I am attempting to build a decision tree using rpart in R and taking advantage of the caret package. Below is the code I am using.

# load libraries
library(caret)
library(rpart)

# define training control
train_control<- trainControl(method="cv", number=10)

# train the model 
model<- train(resp~., data=mydat, trControl=train_control, method="rpart")

# make predictions
predictions<- predict(model,mydat)

# append predictions
mydat<- cbind(mydat,predictions)

# summarize results
confusionMatrix<- confusionMatrix(mydat$predictions,mydat$resp)

I have one question regarding the caret train application. I have read A Short Introduction to the caret Package train section which states during the resampling process the "optimal parameter set" is determined.

In my example have I coded it up correctly? Do I need to define the rpart parameters within my code or is my code sufficient?

358

asked Nov 02 '15 03:11

pmanDS

1 Answers

when you perform k-fold cross validation you are already making a prediction for each sample, just over 10 different models (presuming k = 10). There is no need make a prediction on the complete data, as you already have their predictions from the k different models.

What you can do is the following:

train_control<- trainControl(method="cv", number=10, savePredictions = TRUE)

Then

model<- train(resp~., data=mydat, trControl=train_control, method="rpart")

if you want to see the observed and predictions in a nice format you simply type:

model$pred

Also for the second part of your question, caret should handle all the parameter stuff. You can manually try tune parameters if you desire.

answered Oct 15 '22 02:10

zacdav

Related questions
                            
                                R Crop no-data of a raster
                            
                                How to import JSON into R and convert it to table?
                            
                                How to very efficiently extract specific pattern from characters?
                            
                                R-finding unmatched column names of data frames
                            
                                ggplot equivalent for matplot
                            
                                using mutate_each from dplyr to convert all numeric variables to factor
                            
                                Knit one markdown file to two output files
                            
                                Distribute values in a vector
                            
                                Valuebox like function for static reports
                            
                                Finagling the space and width arguments to barplot to align 2x1 plot window
                            
                                Accumulate values for every possible combination in R
                            
                                Installing ggplot “package ‘ggplot’ is not available” and “subscript out of bounds” errors
                            
                                constructing a Data Frame in Rcpp
                            
                                R: extract directory out of a path [duplicate]
                            
                                R Identifing text string within column of dataframe
                            
                                How to expand an ellipsis (...) argument without evaluating it in R
                            
                                Grab from beginning to first occurrence of character with gsub
                            
                                How to find the indices of the top 10,000 elements in a symmetric matrix(12k X 12k) in R
                            
                                Importing only every Nth row from a .csv file in R
                            
                                shiny app : disable downloadbutton

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With