Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Caret Package: Stratified Cross Validation in Train Function

Is there a way to perform stratified cross validation when using the train function to fit a model to a large imbalanced data set? I know straight forward k fold cross validation is possible but my categories are highly unbalanced. I've seen discussion about this topic but no real definitive answer.

Thanks in advance.

like image 341
Windstorm1981 Avatar asked Mar 10 '16 04:03

Windstorm1981


People also ask

What can train function in package caret do?

The caret package (short for Classification And REgression Training) contains functions to streamline the model training process for complex regression and classification problems.

What is caret package?

The caret package (short for Classification And REgression Training) is a set of functions that attempt to streamline the process for creating predictive models. The package contains tools for: data splitting. pre-processing. feature selection.

Why we use stratified k fold cross validation?

The stratified k fold cross-validation is an extension of the cross-validation technique used for classification problems. It maintains the same class ratio throughout the K folds as the ratio in the original dataset.

What is caret used for in R?

Caret stands for classification and regression training and is arguably the biggest project in R. This package is sufficient to solve almost any classification or regression machine learning problem.


1 Answers

There is a parameter called 'index' which can let user specified the index to do cross validation.

folds <- 4
cvIndex <- createFolds(factor(training$Y), folds, returnTrain = T)
tc <- trainControl(index = cvIndex,
               method = 'cv', 
               number = folds)

rfFit <- train(Y ~ ., data = training, 
            method = "rf", 
            trControl = tc,
            maximize = TRUE,
            verbose = FALSE, ntree = 1000)
like image 160
KST Avatar answered Dec 08 '22 10:12

KST