Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

predict() method for "mice" package

I want to create imputation strategy using mice function from mice package. The problem is I can't seems to find any predict methods (or it's cousins) for new data in this package.

I want to do something like this:

require(mice)
data(boys)

train_boys <- boys[1:400,]
test_boys <- boys[401:nrow(boys),]

mice_object <- mice(train_boys)
train_complete_boys <- complete(train_boys)

# Here comes a hypothetical method
test_complete_boys <- predict(mice_object, test_boys)

I would like to find some approach that would emulate the code above. Now, it's totally possible to do separate mice operations on train and test datasets separately, but it seems like from logical point of view that would be incorrect - all the information you have is in the train dataset. Observations from test dataset shouldn't provide information for each other. That's especially true when dealing with data when observations can be ordered by time of appearance.

One possible approach is to add rows from test dataset to train dataset iteratively, running imputation every time. However this seems very inelegant.

So here is the question:

Is there a method for the mice package that would be similar to the general predict method? If not, what are the possible workarounds?

Thank you!

like image 400
Loiisso Avatar asked Feb 02 '15 14:02

Loiisso


People also ask

How does the mice package in R work?

MICE assumes that the missing data are Missing at Random (MAR), which means that the probability that a value is missing depends only on observed value and can be predicted using them. It imputes data on a variable by variable basis by specifying an imputation model per variable.

Can mice handle categorical variables?

The MICE algorithm can impute mixes of continuous, binary, unordered categorical and ordered categorical data. In addition, MICE can impute continuous two-level data, and maintain consistency between imputations by means of passive imputation.

How does predictive mean matching work?

Predictive Mean Matching (PMM) is a technique of imputation that estimates the likely values of missing data by matching to the observed values/data. This can be carried out either by singular imputations or multiple imputations.

Is mice multiple imputation?

MICE is a multiple imputation method used to replace missing data values in a data set under certain assumptions about the data missingness mechanism (e.g., the data are missing at random, the data are missing completely at random).


1 Answers

I think it could be logically incorrect to "predict" missing values with another imputed dataset, since MICE algorithm is building models iteratively to estimate the missing values by the observed values in your given dataset.

In other words, when you do mice_object <- mice(train_boys), the algorithm estimates and imputes the NAs by the relationships between variables in train_boys. However, such estimation cannot be applied to test_boy because the relationships between variables in test_boy may differ from those in train_boy. Also, the amount of observed information is different between these two datasets.

If you believe the relationships between variables are homogeneous across train_boys and test_boys, how about doing NA imputation before splitting the dataset? i.e.:

mice_object <- mice(boys)
complete_boys <- compete(mice_object)
train_boys <- complete_boys[1:400,]
test_boys <- complete_boys[401:nrow(complete_boys),]

You can read Multiple imputation by chained equations: What is it and how does it work? if you need more information of MICE.

like image 148
ytu Avatar answered Oct 02 '22 21:10

ytu