Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

randomForest: Error in na.fail.default: missing values in object

Tags:

r

missing-data

I tried to train a random forest with cross validation and used the caret package to train the rf:

### variable return_customer = binary variable
idx.train <- createDataPartition(y = known$return_customer, p = 0.8, list = FALSE)
train <- known[idx.train, ]
test <- known[-idx.train, ]
k <- 10
set.seed(123)
model.control <- trainControl(method = "cv", number = k, classProbs = TRUE, summaryFunction = twoClassSummary,  allowParallel = TRUE)
rf.parms <- expand.grid(mtry = 1:10)
rf.caret <- train(return_customer~., data = train, method = "rf", ntree = 500, tuneGrid = rf.parms, metric = "ROC", trControl = model.control)

When running the train function, I get this error code but there are no missing values in return_customer:

Error in na.fail.default(list(return_customer = c(0L, 0L, 0L, 0L, 0L, : missing values in object

I want to understand why the function is reading missing values in the data and how i can fix this issue. I am aware there are similar questions in the forum but i could not fix my code. Thanks!

like image 327
BADS_2016 Avatar asked Jan 15 '17 17:01

BADS_2016


1 Answers

Missing values would be in your predictors.

Try this code to remove rows which have empty values:

row.has.na <- apply(train, 1, function(x){any(is.na(x))})
predictors_no_NA <- train[!row.has.na, ]

Hopefully it helps.

like image 130
Shalini Baranwal Avatar answered Nov 11 '22 05:11

Shalini Baranwal