I'm looking for R packages or machine learning models/algos like randomForest
, glmnet
, gbdt
, etc that can handle NA's, as opposed to ignoring the row or column that has any instances of NA's. I'm not looking to impute. Any suggestions?
The CART algorithm handles NA's rather seamlessly (rpart package). Then you can always turn to bagged trees using rpart
, probably via the ipred package.
I've heard that multivariate adaptive regression splines (mars
in the mda package) handle missing data well, although I don't have much experience with it.
Also, k nearest neighbor models (and kernel methods more generally, I think) can be altered to deal with missing values in a fairly straightforward manner, but implementations may not do that out of the box. But presumably it would be as simple as adjusting the distance metric to only consider pairwise complete cases. I'm less familiar with specific R packages that do more than the vanilla knn models.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With