Missing value error in the randomForest package of R

Question

I am using the randomForest package to classify a binary outcome variable with the standard process. I first had to force a change on all variables to make sure they were numeric and then used na.roughfix to handle missing values:

data <- read.csv("data.csv")
data <- lapply(data, as.numeric)
data <- na.roughfix(data)

Then i run the model:

model <- randomForest(as.factor(outcome) ~ V1 + V2...+ VN, 
         data=data, 
         importance=TRUE,
         ntree=500)

and I get the following error:

Error in na.fail.default(list(as.factor(outcome) = c(2L, 2L, 1L, : missing values in object

The na.roughfix imputation should have taken care of this (I have gotten it to work before and research on here shows that it should work) , right? Any suggestions?

joran · Accepted Answer

Your lapply line didn't do what you expected it to. The result is no longer a data frame, just a list. As a result, the data.frame method of na.roughfix isn't dispatched, just the default method which just returns it's first argument if it isn't atomic (which your list clearly isn't).

The somewhat sneaky way to convert each column to numeric but retain the data frame property would be:

data[] <- lapply(data,as.numeric)

Alternatively, you could simply convert it back via as.data.frame.

Missing value error in the randomForest package of R

Tags:

r

missing-data

machine-learning

random-forest

bencrosier

1 Answers

joran

Recent Activity

Donate For Us

Missing value error in the randomForest package of R

Tags:

r

missing-data

machine-learning

random-forest

bencrosier

1 Answers

joran

Related questions

Recent Activity

Donate For Us