I would like to use group_by %>% do(tidy(*))
to run several linear regression models and to extract model results to the data frame. The data frame should include the following for each model: outcome variable, exposure variable, sample size, beta coefficient, SE and p-value.
library(tidyverse)
data("mtcars")
outcomes <- c("wt, mpg", "hp", "disp")
exposures <- c("gear", "vs", "am")
covariates <- c("drat", "qsec")
The models should regress each outcome on each exposure adjusted for all covariates e.g.
lm(wt ~ factor(gear)+drat+qsec, mtcars, na.action = na.omit)
lm(wt ~ factor(vs)+drat+qsec, mtcars, na.action = na.omit)
etc...
The final code might look something like this?
models <- (mtcars %>%
gather(x_var, x_value, -c(y_var, y_i, cv1:cv3)) %>%
group_by(y_var, x_var) %>%
do(broom::tidy(lm(y_i ~ x_value + cv1 + cv2 + cv3, data = .))))
Here's a solution that first creates the formulas for each model you want to run and then calls the right variables from the dataset you want to analyse, instead of reshaping the dataset itself and apply the models:
library(tidyverse)
library(broom)
outcomes <- c("wt", "mpg", "hp", "disp")
exposures <- c("gear", "vs", "am")
covariates <- c("drat", "qsec")
expand.grid(outcomes, exposures, covariates) %>%
group_by(Var1, Var2) %>%
summarise(Var3 = paste0(Var3, collapse = "+")) %>%
rowwise() %>%
summarise(frm = paste0(Var1, "~factor(", Var2, ")+", Var3)) %>%
group_by(model_id = row_number(),
frm) %>%
do(tidy(lm(.$frm, data = mtcars))) %>%
ungroup()
# # A tibble: 52 x 7
# model_id frm term estimate std.error statistic p.value
# <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 1 wt~factor(gear)+drat+qsec (Intercept) 9.25 2.17 4.27 0.000218
# 2 1 wt~factor(gear)+drat+qsec factor(gear)4 -0.187 0.493 -0.378 0.708
# 3 1 wt~factor(gear)+drat+qsec factor(gear)5 -0.703 0.518 -1.36 0.186
# 4 1 wt~factor(gear)+drat+qsec drat -1.03 0.425 -2.42 0.0227
# 5 1 wt~factor(gear)+drat+qsec qsec -0.121 0.0912 -1.32 0.196
# 6 2 wt~factor(vs)+drat+qsec (Intercept) 4.35 2.28 1.91 0.0663
# 7 2 wt~factor(vs)+drat+qsec factor(vs)1 -1.04 0.416 -2.49 0.0189
# 8 2 wt~factor(vs)+drat+qsec drat -0.918 0.263 -3.49 0.00160
# 9 2 wt~factor(vs)+drat+qsec qsec 0.147 0.106 1.39 0.175
# 10 3 wt~factor(am)+drat+qsec (Intercept) 8.29 1.31 6.33 0.000000766
# # ... with 42 more rows
Very similar process in case you prefer to use map
from purrr
package instead of do
:
expand.grid(outcomes, exposures, covariates) %>%
group_by(Var1, Var2) %>%
summarise(Var3 = paste0(Var3, collapse = "+")) %>%
rowwise() %>%
summarise(frm = paste0(Var1, "~factor(", Var2, ")+", Var3)) %>%
group_by(model_id = row_number()) %>%
mutate(model = map(frm, ~tidy(lm(., data = mtcars)))) %>%
unnest() %>%
ungroup()
Remember that the key to this approach is creating the formulas. So, the code will become simpler if you manage to specify your variables in a slightly different way and help creating the formulas with less code than before:
outcomes <- c("wt", "mpg", "hp", "disp")
exposures <- c("gear", "vs", "am")
covariate1 <- "drat"
covariate2 <- "qsec"
expand.grid(outcomes, exposures, covariate1, covariate2) %>%
transmute(frm = paste0(Var1, "~factor(", Var2, ")+", Var3, "+", Var4)) %>%
group_by(model_id = row_number()) %>%
mutate(model = map(frm, ~tidy(lm(., data = mtcars)))) %>%
unnest() %>%
ungroup()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With