I have a factor with missing values. I know that this factor value depends on the combination of a few dates.
I'm having some trouble getting this to work though. Seems both classes are tricky, especially Date.
For a simple example lets have 1 Date and 1 factor:
require(VIM)
toimpute <- data.frame(mydates = seq(as.Date("1990-01-01"),as.Date("2000-01-01"),50),
imputeme = c(NA,NA,rep(c("a","b","c"),24)))
toimpute$imputeme <- as.factor(toimpute$imputeme)
It seems kNN won't go for it:
imputed <- kNN(toimpute,variable = "imputeme")
Error in
[.data.frame(data.x, , i) : undefined columns selected
mice also doesn't like it. I thought mice was at least supposed to work with factors, though this message says it must be numeric (perhaps it allows factor dependent variables but only numeric for independent variables?):
imputed <- mice(toimpute)
iter imp variable 1 1 imputeme Error in FUN(newX[, i], ...) : 'x' must be numeric In addition: Warning messages: 1: In var(data[, j], na.rm = TRUE) : Calling var(x) on a factor x is deprecated and will become an error. Use something like 'all(duplicated(x)[-1L])' to test for a constant vector. 2: In FUN(newX[, i], ...) : NAs introduced by coercion
I guess if nothing else I can do a random forest model to predict the class of the observations with missing data, but if there's a way to do it with one of the more common missing value functions I'd like to know.
To handle imputation for factor variables, you can use aregImpute or transcan from the Hmisc package.
toimpute <- data.frame(mydates = seq(as.Date("1990-01-01"),as.Date("2000-01-01"),50),
imputeme = c(NA,NA,rep(c("a","b","c"),24)))
toimpute$imputeme <- as.factor(toimpute$imputeme)
require(Hmisc)
imputed <- aregImpute(data=toimpute,mydates~imputeme)
table(is.na(imputed))
FALSE
19
From the documentation under Arguments (for aregImpute), it reads:
formula
an S model formula. You can specify restrictions for transformations of variables. The function automatically determines which variables are categorical (i.e., factor, category, or character vectors). Binary variables are automatically restricted to be linear. Force linear transformations of continuous variables by enclosing variables by the identify function (I()). It is recommended that factor() or as.factor() do not appear in the formula but instead variables be converted to factors as needed and stored in the data frame. That way imputations for factor variables (done using impute.transcan for example) will be correct. Currently reformM does not handle variables that are enclosed in functions such as I().
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With