Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plot Multiple Imputation Results

I have successfully completed a multiple imputation on the missing data of my questionnaire research using the MICE package in R and performed a linear regression on the pooled imputed variables. I can't seem to work out how to extract single pooled variables and plot in a graph. Any ideas?

e.g.

>imp <- mice(questionnaire) 
>fit <- with(imp, lm(APE~TMAS+APB+APA+FOAP))  
>summary(pool(fit))  

I want to plot pooled APE by TMAS.

Reproducible Example using nhanes:

> library(mice)
> nhanes
> imp <-mice(nhanes)
> fit <-with(imp, lm(bmi~chl+hyp))
> fit
> summary(pool(fit))

I would like to plot pooled chl against pooled bmi (for example).

Best I have been able to achieve is

> mat <-complete(imp, "long")
> plot(mat$chl~mat$bmi)

Which I believe gives the combined plot of all 5 imputations and is not quite what I am looking for (I think).

like image 470
Frank Zafka Avatar asked Aug 27 '10 08:08

Frank Zafka


People also ask

What do you do after multiple imputations?

After Multiple Imputation has been performed, the next steps are to apply statistical tests in each imputed dataset and to pool the results to obtain summary estimates. In SPSS and R these steps are mostly part of the same analysis step.

How do you describe multiple imputations?

Multiple imputation is a general approach to the problem of missing data that is available in several commonly used statistical packages. It aims to allow for the uncertainty about the missing data by creating several different plausible imputed data sets and appropriately combining results obtained from each of them.

How much missing data is too much for multiple imputation?

Statistical guidance articles have stated that bias is likely in analyses with more than 10% missingness and that if more than 40% data are missing in important variables then results should only be considered as hypothesis generating [18], [19].

Should I impute outcome variables?

Outcome variables must not be imputed. Predictor variables must not be imputed. Multiple imputation must not be used because you will end up with several different outcomes of your statistical analysis.


1 Answers

the underlying with.mids() function lets the regression be carried out on each imputed dataframe. So it is not one regression, but 5 regressions that happened. pool() just averages the estimated coefficients and adjusts the variances for the statistical inference according to the amount of imputation.

So there aren't single pooled variables to plot. What you could do is average the 5 imputed sets and recreate some kind of "regression line" based on the pooled coefficients, eg :

# Averaged imputed data
combchl <- tapply(mat$chl,mat$.id,mean)
combbmi <- tapply(mat$bmi,mat$.id,mean)
combhyp <- tapply(mat$hyp,mat$.id,mean)

# coefficients
coefs <- pool(fit)$qbar

# regression results
x <- data.frame(
        int = rep(1,25),
        chl = seq(min(combchl),max(combchl),length.out=25),
        hyp = seq(min(combhyp),max(combhyp),length.out=25)
      )

y <- as.matrix(x) %*%coefs


# a plot
plot(combbmi~combchl)
lines(x$chl,y,col="red")
like image 86
Joris Meys Avatar answered Sep 27 '22 21:09

Joris Meys