Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using group by and tidy to run several models and extract results to dataframe

I would like to use group_by %>% do(tidy(*)) to run several linear regression models and to extract model results to the data frame. The data frame should include the following for each model: outcome variable, exposure variable, sample size, beta coefficient, SE and p-value.

library(tidyverse)
data("mtcars")
outcomes <- c("wt, mpg", "hp", "disp")
exposures <- c("gear", "vs", "am")
covariates <- c("drat", "qsec")

The models should regress each outcome on each exposure adjusted for all covariates e.g.

lm(wt ~ factor(gear)+drat+qsec, mtcars, na.action = na.omit)
lm(wt ~ factor(vs)+drat+qsec, mtcars, na.action = na.omit)
etc...

The final code might look something like this?

models <- (mtcars %>%
gather(x_var, x_value, -c(y_var, y_i, cv1:cv3)) %>%
group_by(y_var, x_var) %>%
do(broom::tidy(lm(y_i ~ x_value + cv1 + cv2 + cv3, data = .))))
like image 841
aelhak Avatar asked May 17 '18 11:05

aelhak


1 Answers

Here's a solution that first creates the formulas for each model you want to run and then calls the right variables from the dataset you want to analyse, instead of reshaping the dataset itself and apply the models:

library(tidyverse)
library(broom)

outcomes <- c("wt", "mpg", "hp", "disp")
exposures <- c("gear", "vs", "am")
covariates <- c("drat", "qsec")

expand.grid(outcomes, exposures, covariates) %>%
  group_by(Var1, Var2) %>%
  summarise(Var3 = paste0(Var3, collapse = "+")) %>%
  rowwise() %>%
  summarise(frm = paste0(Var1, "~factor(", Var2, ")+", Var3)) %>%
  group_by(model_id = row_number(),
           frm) %>%
  do(tidy(lm(.$frm, data = mtcars))) %>%
  ungroup()

# # A tibble: 52 x 7
#   model_id frm                       term          estimate std.error statistic     p.value
#      <int> <chr>                     <chr>            <dbl>     <dbl>     <dbl>       <dbl>
# 1        1 wt~factor(gear)+drat+qsec (Intercept)      9.25     2.17       4.27  0.000218   
# 2        1 wt~factor(gear)+drat+qsec factor(gear)4   -0.187    0.493     -0.378 0.708      
# 3        1 wt~factor(gear)+drat+qsec factor(gear)5   -0.703    0.518     -1.36  0.186      
# 4        1 wt~factor(gear)+drat+qsec drat            -1.03     0.425     -2.42  0.0227     
# 5        1 wt~factor(gear)+drat+qsec qsec            -0.121    0.0912    -1.32  0.196      
# 6        2 wt~factor(vs)+drat+qsec   (Intercept)      4.35     2.28       1.91  0.0663     
# 7        2 wt~factor(vs)+drat+qsec   factor(vs)1     -1.04     0.416     -2.49  0.0189     
# 8        2 wt~factor(vs)+drat+qsec   drat            -0.918    0.263     -3.49  0.00160    
# 9        2 wt~factor(vs)+drat+qsec   qsec             0.147    0.106      1.39  0.175      
# 10        3 wt~factor(am)+drat+qsec   (Intercept)      8.29     1.31       6.33  0.000000766
# # ... with 42 more rows

Very similar process in case you prefer to use map from purrr package instead of do:

expand.grid(outcomes, exposures, covariates) %>%
  group_by(Var1, Var2) %>%
  summarise(Var3 = paste0(Var3, collapse = "+")) %>%
  rowwise() %>%
  summarise(frm = paste0(Var1, "~factor(", Var2, ")+", Var3)) %>%
  group_by(model_id = row_number()) %>%
  mutate(model = map(frm, ~tidy(lm(., data = mtcars)))) %>%
  unnest() %>%
  ungroup()

Remember that the key to this approach is creating the formulas. So, the code will become simpler if you manage to specify your variables in a slightly different way and help creating the formulas with less code than before:

outcomes <- c("wt", "mpg", "hp", "disp")
exposures <- c("gear", "vs", "am")
covariate1 <- "drat"
covariate2 <- "qsec"

expand.grid(outcomes, exposures, covariate1, covariate2) %>%
  transmute(frm = paste0(Var1, "~factor(", Var2, ")+", Var3, "+", Var4)) %>%
  group_by(model_id = row_number()) %>%
  mutate(model = map(frm, ~tidy(lm(., data = mtcars)))) %>%
  unnest() %>%
  ungroup()
like image 179
AntoniosK Avatar answered Oct 14 '22 02:10

AntoniosK