Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R packages/models that can handle NA's

I'm looking for R packages or machine learning models/algos like randomForest, glmnet, gbdt, etc that can handle NA's, as opposed to ignoring the row or column that has any instances of NA's. I'm not looking to impute. Any suggestions?

like image 457
screechOwl Avatar asked Oct 09 '22 06:10

screechOwl


1 Answers

The CART algorithm handles NA's rather seamlessly (rpart package). Then you can always turn to bagged trees using rpart, probably via the ipred package.

I've heard that multivariate adaptive regression splines (mars in the mda package) handle missing data well, although I don't have much experience with it.

Also, k nearest neighbor models (and kernel methods more generally, I think) can be altered to deal with missing values in a fairly straightforward manner, but implementations may not do that out of the box. But presumably it would be as simple as adjusting the distance metric to only consider pairwise complete cases. I'm less familiar with specific R packages that do more than the vanilla knn models.

like image 93
joran Avatar answered Oct 13 '22 12:10

joran