Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perform operation on each imputed dataset in R's MICE

Tags:

r

r-mice

How can I perform an operation (like subsetting or adding a calculated column) on each imputed dataset in an object of class mids from R's package mice? I would like the result to still be a mids object.

Edit: Example

library(mice)
data(nhanes)

# create imputed datasets
imput = mice(nhanes)

The imputed datasets are stored as a list of lists

imput$imp

where there are rows only for the observations with imputation for the given variable.

The original (incomplete) dataset is stored here:

imput$data

For example, how would I create a new variable calculated as chl/2 in each of the imputed datasets, yielding a new mids object?

like image 949
half-pass Avatar asked Oct 31 '14 03:10

half-pass


People also ask

What is mice data imputation?

MICE is a multiple imputation method used to replace missing data values in a data set under certain assumptions about the data missingness mechanism (e.g., the data are missing at random, the data are missing completely at random).

How do you impute a mouse?

To impute the missing values, mice package use an algorithm in a such a way that use information from other variables in dataset to predict and impute the missing values. Therefore, you may not want to use certain variable as predictors. For example the ID variable does not have any predictive value.


4 Answers

This can be done easily as follows -

Use complete() to convert a mids object to a long-format data.frame:

 long1 <- complete(midsobj1, action='long', include=TRUE)

Perform whatever manipulations needed:

 long1$new.var <- long1$chl/2
 long2 <- subset(long1, age >= 5)

use as.mids() to convert back manipulated data to mids object:

 midsobj2 <- as.mids(long2)

Now you can use midsobj2 as required. Note that the include=TRUE (used to include the original data with missing values) is needed for as.mids() to compress the long-formatted data properly. Note that prior to mice v2.25 there was a bug in the as.mids() function (see this post https://stats.stackexchange.com/a/158327/69413)

EDIT: According to this answer https://stackoverflow.com/a/34859264/4269699 (from what is essentially a duplicate question) you can also edit the mids object directly by accessing $data and $imp. So for example

 midsobj2<-midsobj1
 midsobj2$data$new.var <- midsobj2$data$chl/2
 midsobj2$imp$new.var <- midsobj2$imp$chl/2

You will run into trouble though if you want to subset $imp or if you want to use $call, so I wouldn't recommend this solution in general.

like image 71
wjchulme Avatar answered Oct 17 '22 05:10

wjchulme


There's a function for this in the basecamb package:

library(basecamb)
apply_function_to_imputed_data(mids_object, function)
like image 31
p-mq Avatar answered Sep 22 '22 07:09

p-mq


Another option is to calculate the variables before the imputation and place restrictions on them.

library(mice)

# Create the additional variable - this will have missing
nhanes$extra <- nhanes$chl / 2

# Change the method of imputation for extra, so that it always equals chl/2
# Change the predictor matrix so only chl predicts extra
ini <- mice(nhanes, max = 0, print = FALSE)

meth <- ini$meth
meth["extra"] <- "~I(chl / 2)"

pred <- ini$pred  # extra isn't used to predict
pred["extra", "chl"] <- 1

# Imputations
imput <- mice(nhanes, seed = 1, pred = pred, meth = meth, print = FALSE)

There are examples in mice: Multivariate Imputation by Chained Equations in R.

like image 4
user20650 Avatar answered Oct 17 '22 06:10

user20650


There is an overload of with that can help you here

with(imput, chl/2)

the documentation is given at ?with.mids

like image 1
MrFlick Avatar answered Oct 17 '22 07:10

MrFlick