I am receiving the following error in R when stacking using the caret package.
"Error: At least one of the class levels is not a valid R variable name; This will cause errors when class probabilities are generated because the variables names will be converted to not5, X5sets . Please use factor levels that can be used as valid R variable names (see ?make.names for help)."
The below is the code I am trying to run.
library(caretEnsemble)
control <- trainControl(method="repeatedcv", number=10, repeats=3, savePredictions=TRUE, classProbs=TRUE)
algorithmList <- c('rpart', 'knn', 'svmRadial')
set.seed(222)
models <- caretList(Tsets ~ MatchSurface + MatchRound + AgeDiff + SameHand + HeightDiff, data=up_sample, trControl=control, methodList=algorithmList)
results <- resamples(models)
When I remove classProbs=TRUE
, the code runs but I want to keep this as there is further code I am trying to run after this which requires it. All of my variables are factors or integers and I have changed all classes so they do not have "0"'s and "1"s. Therefore I cant figure out why the code wont run.
I have attached a picture of the data structure below. Would be great if anyone had some advice.
Try changing your target variable to "yes"/"no" instead of 1/0.
When caretList() runs a tree-based model (here rpart, but also applies to random forests), it converts the factor levels into variables which are used to split the tree. For these variables, names starting with a number are not allowed nor that they contain spaces. So for each of these variables, you can convert the level names to valid labels with the following code.
up_sample %>%
mutate(Tsets = factor(Tsets,
labels = make.names(levels(Tsets))))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With