I tried to train a random forest with cross validation and used the caret
package to train the rf:
### variable return_customer = binary variable
idx.train <- createDataPartition(y = known$return_customer, p = 0.8, list = FALSE)
train <- known[idx.train, ]
test <- known[-idx.train, ]
k <- 10
set.seed(123)
model.control <- trainControl(method = "cv", number = k, classProbs = TRUE, summaryFunction = twoClassSummary, allowParallel = TRUE)
rf.parms <- expand.grid(mtry = 1:10)
rf.caret <- train(return_customer~., data = train, method = "rf", ntree = 500, tuneGrid = rf.parms, metric = "ROC", trControl = model.control)
When running the train
function, I get this error code but there are no missing values in return_customer
:
Error in na.fail.default(list(return_customer = c(0L, 0L, 0L, 0L, 0L, : missing values in object
I want to understand why the function is reading missing values in the data and how i can fix this issue. I am aware there are similar questions in the forum but i could not fix my code. Thanks!
Missing values would be in your predictors.
Try this code to remove rows which have empty values:
row.has.na <- apply(train, 1, function(x){any(is.na(x))})
predictors_no_NA <- train[!row.has.na, ]
Hopefully it helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With