How can I perform an operation (like subsetting or adding a calculated column) on each imputed dataset in an object of class mids
from R's package mice
? I would like the result to still be a mids
object.
Edit: Example
library(mice)
data(nhanes)
# create imputed datasets
imput = mice(nhanes)
The imputed datasets are stored as a list of lists
imput$imp
where there are rows only for the observations with imputation for the given variable.
The original (incomplete) dataset is stored here:
imput$data
For example, how would I create a new variable calculated as chl/2
in each of the imputed datasets, yielding a new mids
object?
MICE is a multiple imputation method used to replace missing data values in a data set under certain assumptions about the data missingness mechanism (e.g., the data are missing at random, the data are missing completely at random).
To impute the missing values, mice package use an algorithm in a such a way that use information from other variables in dataset to predict and impute the missing values. Therefore, you may not want to use certain variable as predictors. For example the ID variable does not have any predictive value.
This can be done easily as follows -
Use complete()
to convert a mids object to a long-format data.frame:
long1 <- complete(midsobj1, action='long', include=TRUE)
Perform whatever manipulations needed:
long1$new.var <- long1$chl/2
long2 <- subset(long1, age >= 5)
use as.mids()
to convert back manipulated data to mids object:
midsobj2 <- as.mids(long2)
Now you can use midsobj2
as required. Note that the include=TRUE
(used to include the original data with missing values) is needed for as.mids()
to compress the long-formatted data properly. Note that prior to mice v2.25 there was a bug in the as.mids() function (see this post https://stats.stackexchange.com/a/158327/69413)
EDIT: According to this answer https://stackoverflow.com/a/34859264/4269699 (from what is essentially a duplicate question) you can also edit the mids object directly by accessing $data and $imp. So for example
midsobj2<-midsobj1
midsobj2$data$new.var <- midsobj2$data$chl/2
midsobj2$imp$new.var <- midsobj2$imp$chl/2
You will run into trouble though if you want to subset $imp or if you want to use $call, so I wouldn't recommend this solution in general.
There's a function for this in the basecamb
package:
library(basecamb)
apply_function_to_imputed_data(mids_object, function)
Another option is to calculate the variables before the imputation and place restrictions on them.
library(mice)
# Create the additional variable - this will have missing
nhanes$extra <- nhanes$chl / 2
# Change the method of imputation for extra, so that it always equals chl/2
# Change the predictor matrix so only chl predicts extra
ini <- mice(nhanes, max = 0, print = FALSE)
meth <- ini$meth
meth["extra"] <- "~I(chl / 2)"
pred <- ini$pred # extra isn't used to predict
pred["extra", "chl"] <- 1
# Imputations
imput <- mice(nhanes, seed = 1, pred = pred, meth = meth, print = FALSE)
There are examples in mice: Multivariate Imputation by Chained Equations in R.
There is an overload of with
that can help you here
with(imput, chl/2)
the documentation is given at ?with.mids
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With