This previous question "How to repeatedly perform glm over multiple dependent variables after mice?" - did not work for me. I don't understand how it incorporates the pooling of the mids.
######################
I need to repeat logistic regression for 7 dependent variables, using the same 5 predictor variables for each regression.
I need to do multiple imputation using mice before I perform the regressions.
I then need an output of exponentiated odds ratios and confidence intervals for each of the dependent variables.
Here is a smaller example:
P1 <- c(7,8,5,NA)
P2 <- c(10,12,11,12)
P3 <- c(1,1,0,1)
P4 <- c(1,1,1,0)
R1 <- c(0,1,0,1)
R2 <- c(0,0,1,0)
R3 <- c(1,1,0,0)
R4 <- c(0,1,1,1)
R5 <- c(1,0,1,0)
df1 <- data.frame(P1,P2,P3,P4,R1,R2,R3,R4,R5)
cols <- c("P3","P4","R1","R2","R3","R4","R5")
df1[cols] <- lapply(df1[cols], factor)
Multiple Imputation:
library(mice)
imp <- mice(df1, maxit=0)
predM <- imp$predictorMatrix
meth <- imp$method
imp2 <- mice(df1, maxit=5,
predictorMatrix = predM,
method = meth, print = F)
Logistic Regression after pooling imputed data:
m1.mi <- with(imp2, glm(R1 ~ P1 + P2 + P3 + P4, family=binomial(link=logit)))
summary(pool(m1.mi),exponentiate=TRUE, conf.int = TRUE)
Is there a way to create a loop, that will produce an odds ratio and CIs for each dependent variable?
Any help appreciated, I apologise for my very limited knowledge in this area.
reformulate is neat for jobs like this. Define the process of one iteration in an anonymous function (those with function(x) or abbreviated \(x)) and put it into lapply. The summary can be subsetted to desired columns just like a data frame using the respective column names in brackets.
Let's use the nhanes data coming with mice in an example.
dvs <- c('age1', 'age2', 'age3')
res <- lapply(dvs, \(x) {
mi <- with(imp, glm(reformulate(c('bmi', 'hyp', 'chl'), x), family=binomial))
s <- summary(pool(mi), exponentiate=TRUE, conf.int=TRUE)
`rownames<-`(s[c('estimate', '2.5 %', '97.5 %')], s$term) |> as.matrix()
}) |> setNames(dvs)
This gives a named list like this one,
res
# $age1
# estimate 2.5 % 97.5 %
# (Intercept) 2.197201e+04 0.0000000 Inf
# bmi 1.999346e+00 0.4691464 8.520549
# hyp 4.331347e-08 0.0000000 Inf
# chl 9.441578e-01 0.8403936 1.060734
#
# $age2
# estimate 2.5 % 97.5 %
# (Intercept) 0.9300514 0.0002224722 3888.105740
# bmi 0.8891490 0.6550513927 1.206907
# hyp 2.8818840 0.2054892623 40.416981
# chl 1.0046189 0.9757483655 1.034344
#
# $age3
# estimate 2.5 % 97.5 %
# (Intercept) 3.5383401 1.087542e-07 1.151206e+08
# bmi 0.6442681 2.792523e-01 1.486403e+00
# hyp 5.9651279 5.954159e-02 5.976117e+02
# chl 1.0323994 9.885890e-01 1.078151e+00
where you can output individual elements in this way:
res$age1
# estimate 2.5 % 97.5 %
# (Intercept) 2.197201e+04 0.0000000 Inf
# bmi 1.999346e+00 0.4691464 8.520549
# hyp 4.331347e-08 0.0000000 Inf
# chl 9.441578e-01 0.8403936 1.060734
Where
class(res$age1)
# [1] "matrix" "array"
If you want class "data.frame" in the list elements, just leave out the |> as.matrix() part.
Data:
data("nhanes", package='mice')
nhanes <- transform(nhanes, age1=age == 1, age2=age == 2, age3=age == 3)
library(mice)
imp0 <- mice(nhanes, maxit=0)
imp <- mice(nhanes, maxit=5,
predictorMatrix=imp0$predictorMatrix,
method=imp0$method, print=F)
You could do something like:
model_list <- lapply( paste0("R",1:5),function(dep_var){
formula <- as.formula(paste0(dep_var,"~ P1 + P2 + P3 + P4"))
m1.mi <- with(imp2, glm(formula, family=binomial(link=logit)))
summary(pool(m1.mi),exponentiate=TRUE, conf.int = TRUE) %>%
mutate(depvar = dep_var)
})
do.call(rbind,model_list)
term estimate std.error statistic df p.value 2.5 % 97.5 % depvar
1 (Intercept) 1.704353e-121 621431.55 -4.474823e-04 0 0.9999859 NaN NaN R1
2 P1 1.241210e+04 38928.27 2.421486e-04 0 0.9999923 NaN NaN R1
3 P2 1.540603e+08 59463.92 3.170469e-04 0 0.9999900 NaN NaN R1
4 P3 NA NA NA NA NA NA NA R1
5 P4 NA NA NA NA NA NA NA R1
6 (Intercept) 1.382828e+06 621431.49 2.275334e-05 0 0.9999993 NaN NaN R2
7 P1 6.490963e-09 38928.28 -4.842972e-04 0 0.9999847 NaN NaN R2
8 P2 1.241211e+04 59463.92 1.585235e-04 0 0.9999950 NaN NaN R2
9 P3 NA NA NA NA NA NA NA R2
10 P4 NA NA NA NA NA NA NA R2
11 (Intercept) 7.231535e-07 621431.51 -2.275334e-05 0 0.9999993 NaN NaN R3
12 P1 1.540603e+08 38928.27 4.842972e-04 0 0.9999847 NaN NaN R3
13 P2 8.056654e-05 59463.92 -1.585235e-04 0 0.9999950 NaN NaN R3
14 P3 NA NA NA NA NA NA NA R3
15 P4 NA NA NA NA NA NA NA R3
16 (Intercept) 4.045223e-105 621431.46 -3.868068e-04 0 0.9999878 NaN NaN R4
17 P1 8.056652e-05 38928.27 -2.421486e-04 0 0.9999923 NaN NaN R4
18 P2 1.912213e+12 59463.92 4.755705e-04 0 0.9999850 NaN NaN R4
19 P3 NA NA NA NA NA NA NA R4
20 P4 NA NA NA NA NA NA NA R4
21 (Intercept) 5.867312e+120 621431.46 4.474823e-04 0 0.9999859 NaN NaN R5
22 P1 8.056650e-05 38928.28 -2.421486e-04 0 0.9999923 NaN NaN R5
23 P2 6.490965e-09 59463.92 -3.170470e-04 0 0.9999900 NaN NaN R5
24 P3 NA NA NA NA NA NA NA R5
25 P4 NA NA NA NA NA NA NA R5
Which produces a long format of all your analysis. The trick is only creating your regression formula with the as.formula function. I use lapply so it produces a list, which is easier to handle. the do.call just bind the results together (and that is why I add what is the dependent variable in your summary)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With