Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Predicting responses for new observations using a model developed with multiple imputation via MICE

Tags:

r

predict

r-mice

I have developed a model via multiple imputation using mice. I want to use this model to predict responses for new observations (containing no missing data), including standard errors. Passing the model object created in mice to predict doesn't work

A simple example using the in-built nhanes dataset. Say I wanted to develop a logistic regression model with the form age == 3 ~ bmi + hyp + chl, and use this model to predict, say, prob(age = 3 | bmi = 20, hyp = 2 and chl = 190)

library('mice')
imp<-mice(nhanes, seed = 1)

#create model on each imputed dataset
model <- with(imp, glm(age == 3 ~ bmi + hyp + chl, family = binomial))

#pool models into one
poolmodel <- pool(model)

#new data
newdata <- data.frame(bmi = 20, hyp = 2, chl = 190)

#attempt to predict response using predict() function
pred <- predict(object = model, newdata = newdata, type = 'link', se.fit = TRUE)

Error in UseMethod("predict") : no applicable method for 'predict' applied to an object of class "c('mira', 'matrix')"

pred <- predict(object = poolmodel, newdata = newdata, type = 'link', se.fit = TRUE)

Error in UseMethod("predict") : no applicable method for 'predict' applied to an object of class "c('mipo', 'mira', 'matrix')"

Obviously it would be straight forward to calculate predicted responses and errors manually using the pooled coefficients and the pooled covariance matrix. The real problem however is much larger and the model relies on a few splines and interactions, complicating calculations considerably. I would rather use existing functions that can do all this for me.

Is there a simple solution in R that will output predicted responses for any given (pooled) model object and any given set of new observations, without having to make cumbersome code modifications?

like image 519
wjchulme Avatar asked Sep 14 '15 16:09

wjchulme


1 Answers

One way to do this is to stack all imputed data together and fit model on this complete dataset. After that you can use the function predict as normal. Parameter estimates generated by pool is actually the average of parameter estimates when you fit the same model on each imputed data separately. Of course, in this case, standard error for each covariate is underestimated.

like image 176
Thaole Avatar answered Oct 20 '22 00:10

Thaole