I am using the randomForest
package to classify a binary outcome variable with the standard process. I first had to force a change on all variables to make sure they were numeric and then used na.roughfix
to handle missing values:
data <- read.csv("data.csv")
data <- lapply(data, as.numeric)
data <- na.roughfix(data)
Then i run the model:
model <- randomForest(as.factor(outcome) ~ V1 + V2...+ VN,
data=data,
importance=TRUE,
ntree=500)
and I get the following error:
Error in na.fail.default(list(as.factor(outcome) = c(2L, 2L, 1L, : missing values in object
The na.roughfix imputation should have taken care of this (I have gotten it to work before and research on here shows that it should work) , right? Any suggestions?
Your lapply
line didn't do what you expected it to. The result is no longer a data frame, just a list. As a result, the data.frame
method of na.roughfix
isn't dispatched, just the default method which just returns it's first argument if it isn't atomic (which your list clearly isn't).
The somewhat sneaky way to convert each column to numeric but retain the data frame property would be:
data[] <- lapply(data,as.numeric)
Alternatively, you could simply convert it back via as.data.frame
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With