Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R MICE imputation failing

I am really baffled about why my imputation is failing in R's mice package. I am attempting a very simple operation with the following data frame:

dfn <- read.table(text =
"a b c  d
 0 1 0  1
 1 0 0  0
 0 0 0  0
NA 0 0  0
 0 0 0 NA", header = TRUE)

I then use mice in the following way to perform a simple mean imputation:

imp <- mice(dfn, method = "mean", m = 1, maxit =1)
filled <- complete(imp)

However, my completed data looks like this:

filled
#     a b c  d
#1 0.00 1 0  1
#2 1.00 0 0  0
#3 0.00 0 0  0
#4 0.25 0 0  0
#5 0.00 0 0 NA

Why am I still getting this trailing NA? This is the simplest failing example I could construct, but my real data set is much larger and I am just trying to get a sense of where things are going wrong. Any help would be greatly appreciated!

like image 502
mjnichol Avatar asked Nov 09 '22 22:11

mjnichol


1 Answers

I'm not really sure how accurate this is, but here is an attempt. Even though method="mean" is supposed to impute the unconditional mean, it appears from the documentation that the prdictorMatrix is not being changed accordingly.

Normally, leftover NA occur because the predictors suffer from multicollinearity or because there are too few cases per variable (such that the imputation model cannot be estimated). However, method="mean" shouldn't behave that way.

Here is what I did:

dfn <- read.table(text="a b c  d
 0 1 0  1
 1 0 0  0
 0 0 0  0
NA 0 0  0
 0 0 0 NA", header=TRUE)

imp <- mice( dfn, method="mean", predictorMatrix=diag(ncol(dfn)) )
complete(imp)

# 1 0.00 1 0 1.00
# 2 1.00 0 0 0.00
# 3 0.00 0 0 0.00
# 4 0.25 0 0 0.00
# 5 0.00 0 0 0.25

You can try this using your actual data set, but you should check the results carefully. For example, do:

sapply(dfn, function(x) mean(x,na.rm=TRUE))

The means for each variable should be identical to those that have been imputed. Please let me know if this solves your problem.

like image 75
SimonG Avatar answered Nov 15 '22 06:11

SimonG