I want to create imputation strategy using <code>mice</code> function from <code>mice</code> package. The problem is I can't seems to find any <code>predict</code> methods (or it's cousins) for new data in this package. I want to do something like this: <pre class="prettyprint lang-r prettyprint-override"><code>require(mice) data(boys) train_boys <- boys[1:400,] test_boys <- boys[401:nrow(boys),] mice_object <- mice(train_boys) train_complete_boys <- complete(train_boys) # Here comes a hypothetical method test_complete_boys <- predict(mice_object, test_boys) </code></pre> I would like to find some approach that would emulate the code above. Now, it's totally possible to do separate <code>mice</code> operations on train and test datasets separately, but it seems like from logical point of view that would be incorrect - all the information you have is in the train dataset. Observations from test dataset shouldn't provide information for each other. That's especially true when dealing with data when observations can be ordered by time of appearance. One possible approach is to add rows from test dataset to train dataset iteratively, running imputation every time. However this seems very inelegant. So here is the question: Is there a method for the <code>mice</code> package that would be similar to the general <code>predict</code> method? If not, what are the possible workarounds? Thank you!

I think it could be logically incorrect to "predict" missing values with another imputed dataset, since MICE algorithm is building models iteratively to estimate the missing values by the observed values in your given dataset. In other words, when you do <code>mice_object <- mice(train_boys)</code>, the algorithm estimates and imputes the NAs by the relationships between variables in <code>train_boys</code>. However, such estimation cannot be applied to <code>test_boy</code> because the relationships between variables in <code>test_boy</code> may differ from those in <code>train_boy</code>. Also, the amount of observed information is different between these two datasets. If you believe the relationships between variables are homogeneous across <code>train_boys</code> and <code>test_boys</code>, how about doing NA imputation before splitting the dataset? i.e.: <pre class="prettyprint"><code>mice_object <- mice(boys) complete_boys <- compete(mice_object) train_boys <- complete_boys[1:400,] test_boys <- complete_boys[401:nrow(complete_boys),] </code></pre> You can read Multiple imputation by chained equations: What is it and how does it work? if you need more information of MICE.

predict() method for "mice" package

Tags:

r

imputation

r-mice

I want to create imputation strategy using mice function from mice package. The problem is I can't seems to find any predict methods (or it's cousins) for new data in this package.

I want to do something like this:

Click to copy

require(mice)
data(boys)

train_boys <- boys[1:400,]
test_boys <- boys[401:nrow(boys),]

mice_object <- mice(train_boys)
train_complete_boys <- complete(train_boys)

# Here comes a hypothetical method
test_complete_boys <- predict(mice_object, test_boys)

I would like to find some approach that would emulate the code above. Now, it's totally possible to do separate mice operations on train and test datasets separately, but it seems like from logical point of view that would be incorrect - all the information you have is in the train dataset. Observations from test dataset shouldn't provide information for each other. That's especially true when dealing with data when observations can be ordered by time of appearance.

One possible approach is to add rows from test dataset to train dataset iteratively, running imputation every time. However this seems very inelegant.

So here is the question:

Is there a method for the mice package that would be similar to the general predict method? If not, what are the possible workarounds?

Thank you!

400

asked Feb 02 '15 14:02

Loiisso

1 Answers

I think it could be logically incorrect to "predict" missing values with another imputed dataset, since MICE algorithm is building models iteratively to estimate the missing values by the observed values in your given dataset.

In other words, when you do mice_object <- mice(train_boys), the algorithm estimates and imputes the NAs by the relationships between variables in train_boys. However, such estimation cannot be applied to test_boy because the relationships between variables in test_boy may differ from those in train_boy. Also, the amount of observed information is different between these two datasets.

If you believe the relationships between variables are homogeneous across train_boys and test_boys, how about doing NA imputation before splitting the dataset? i.e.:

Click to copy

mice_object <- mice(boys)
complete_boys <- compete(mice_object)
train_boys <- complete_boys[1:400,]
test_boys <- complete_boys[401:nrow(complete_boys),]

You can read Multiple imputation by chained equations: What is it and how does it work? if you need more information of MICE.

148

answered Oct 02 '22 21:10

ytu

Related questions
                            
                                strange characters: interaction of R and Windows locale?
                            
                                Emacs tab auto-complete for R data.table?
                            
                                Why is an R object so much larger than the same data in Stata/SPSS?
                            
                                Size of string in message function in R
                            
                                Is function(){} a true quine?
                            
                                How can I have more than 4 section colours in mermaid (Gantt) via DiagrammeR?
                            
                                Send a email with Attachment in R using Gmail
                            
                                Using R for multi-class logistic regression
                            
                                Programmatically read Access (.mdb) files into R for both Windows and Mac
                            
                                Rewiring weighted graph produces NAs
                            
                                Counting how many times a condition is true within each group
                            
                                Editable plots in PowerPoint from python: equivalent of officer and rvg
                            
                                R data.table function doesn't recognize an already-specified argument
                            
                                dplyr 0.7.5 change in select() behavior
                            
                                How to exchange Msgpack files between Python and R?
                            
                                add a secondary y axis to ggplot2 plots - make it perfect
                            
                                What methods exist for distributing a semi-live dataset with an R package?
                            
                                'x' and 'w' must have same length - error in weighted.mean.default
                            
                                R: how to use long vectors with randomForest?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With