Is there a way to perform stratified cross validation when using the train function to fit a model to a large imbalanced data set? I know straight forward k fold cross validation is possible but my categories are highly unbalanced. I've seen discussion about this topic but no real definitive answer.
Thanks in advance.
The caret package (short for Classification And REgression Training) contains functions to streamline the model training process for complex regression and classification problems.
The caret package (short for Classification And REgression Training) is a set of functions that attempt to streamline the process for creating predictive models. The package contains tools for: data splitting. pre-processing. feature selection.
The stratified k fold cross-validation is an extension of the cross-validation technique used for classification problems. It maintains the same class ratio throughout the K folds as the ratio in the original dataset.
Caret stands for classification and regression training and is arguably the biggest project in R. This package is sufficient to solve almost any classification or regression machine learning problem.
There is a parameter called 'index' which can let user specified the index to do cross validation.
folds <- 4
cvIndex <- createFolds(factor(training$Y), folds, returnTrain = T)
tc <- trainControl(index = cvIndex,
method = 'cv',
number = folds)
rfFit <- train(Y ~ ., data = training,
method = "rf",
trControl = tc,
maximize = TRUE,
verbose = FALSE, ntree = 1000)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With