Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Applying k-fold Cross Validation model using caret package

Let me start by saying that I have read many posts on Cross Validation and it seems there is much confusion out there. My understanding of that it is simply this:

  1. Perform k-fold Cross Validation i.e. 10 folds to understand the average error across the 10 folds.
  2. If acceptable then train the model on the complete data set.

I am attempting to build a decision tree using rpart in R and taking advantage of the caret package. Below is the code I am using.

# load libraries
library(caret)
library(rpart)

# define training control
train_control<- trainControl(method="cv", number=10)

# train the model 
model<- train(resp~., data=mydat, trControl=train_control, method="rpart")

# make predictions
predictions<- predict(model,mydat)

# append predictions
mydat<- cbind(mydat,predictions)

# summarize results
confusionMatrix<- confusionMatrix(mydat$predictions,mydat$resp)

I have one question regarding the caret train application. I have read A Short Introduction to the caret Package train section which states during the resampling process the "optimal parameter set" is determined.

In my example have I coded it up correctly? Do I need to define the rpart parameters within my code or is my code sufficient?

like image 358
pmanDS Avatar asked Nov 02 '15 03:11

pmanDS


People also ask

How k-fold cross-validation is implemented?

The k-fold cross validation is implemented by randomly dividing the set of observations into k groups, or folds, of approximately equal size. The first fold is treated as a validation set, and the method is fit on the remaining k??? 1 folds.

What is the caret package in R?

Caret is a one-stop solution for machine learning in R. The R package caret has a powerful train function that allows you to fit over 230 different models using one syntax. There are over 230 models included in the package including various tree-based models, neural nets, deep learning and much more.


1 Answers

when you perform k-fold cross validation you are already making a prediction for each sample, just over 10 different models (presuming k = 10). There is no need make a prediction on the complete data, as you already have their predictions from the k different models.

What you can do is the following:

train_control<- trainControl(method="cv", number=10, savePredictions = TRUE)

Then

model<- train(resp~., data=mydat, trControl=train_control, method="rpart")

if you want to see the observed and predictions in a nice format you simply type:

model$pred

Also for the second part of your question, caret should handle all the parameter stuff. You can manually try tune parameters if you desire.

like image 97
zacdav Avatar answered Oct 15 '22 02:10

zacdav