Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Caret::train - Values Not Imputed

Tags:

r

r-caret

I am trying to impute values by passing "knnImpute" to the preProcess argument of Caret's train() method. Based on the following example, it appears that the values are not imputed, remain as NA and are then ignored. What am I doing wrong?

Any help is much appreciated.

library("caret")

set.seed(1234)
data(iris)

# mark 8 of the cells as NA, so they can be imputed
row <- sample (1:nrow (iris), 8)
iris [row, 1] <- NA

# split test vs training
train.index <- createDataPartition (y = iris[,5], p = 0.80, list = F)
train <- iris [ train.index, ]
test  <- iris [-train.index, ]

# train the model after imputing the missing data
fit <- train (Species ~ ., 
              train, 
              preProcess = c("knnImpute"), 
              na.action  = na.pass, 
              method     = "rpart" )
test$species.hat <- predict (fit, test)

# there is 1 obs. (of 30) in the test set equal to NA  
# this 1 obs. was not returned from predict
Error in `$<-.data.frame`(`*tmp*`, "species.hat", value = c(1L, 1L, 1L,  : 
  replacement has 29 rows, data has 30

UPDATE: I have been able to use the preProcess function directly to impute the values. I still don't understand why this does not seem to occur within the train function.

# attempt to impute using nearest neighbors
x <- iris [, 1:4]
pp <- preProcess (x, method = c("knnImpute"))
x.imputed <- predict (pp, newdata = x)

# expect all NAs were populated with an imputed value
stopifnot( all (!is.na (x.imputed)))
stopifnot( length (x) == length (x.imputed))
like image 842
Nick Allen Avatar asked Nov 18 '13 18:11

Nick Allen


1 Answers

See ?predict.train:

 ## S3 method for class 'train'
 predict(object, newdata = NULL, type = "raw", na.action = na.omit, ...)

There is an na.omit here too:

 > length(predict (fit, test))
 [1] 29
 > length(predict (fit, test, na.action = na.pass))
 [1] 30

Max

like image 176
topepo Avatar answered Nov 05 '22 13:11

topepo