Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use broom and tidyverse to run regressions on different dependent variables

I'm looking for a Tidyverse / broom solution that can solve this puzzle:

Let's say I have different DVs and a specific set of IVS and I want to perform a regression that considers every DV and this specific set of IVs. I know I can use something like for i in or apply family, but I really want to run that using tidyverse.

The following code works as an example

ds <- data.frame(income = rnorm(100, mean=1000,sd=200),
                 happiness = rnorm(100, mean = 6, sd=1),
                 health = rnorm(100, mean=20, sd = 3),
                 sex = c(0,1),
                 faculty = c(0,1,2,3))

mod1 <- lm(income ~ sex + faculty, ds)
mod2 <- lm(happiness ~ sex + faculty, ds)
mod3 <- lm(health ~ sex + faculty, ds)
summary(mod1)
summary(mod2)
summary(mod3)

Income, happiness, and health are DVs. Sex and Faculty are IVs and they will be used for all regressions.

That was the closest I found

Let me know If I need to clarify my question. Thanks.

like image 690
Luis Avatar asked Aug 01 '18 20:08

Luis


People also ask

Can you run a regression with multiple dependent variables?

Yes, it is possible. What you're interested is is called "Multivariate Multiple Regression" or just "Multivariate Regression". I don't know what software you are using, but you can do this in R. Here's a link that provides examples.

Can you do regression with categorical dependent variables?

Categorical variables require special attention in regression analysis because, unlike dichotomous or continuous variables, they cannot by entered into the regression equation just as they are. Instead, they need to be recoded into a series of variables which can then be entered into the regression model.

Can you do multiple regression with one variable?

In simple linear regression, we will find the correlation between one dependent and independent variable this is called linear regression with one variable. If you have many(n) independent variables and it is called multiple linear regression.


Video Answer


3 Answers

As you have different dependent variables but the same independent, you can form a matrix of these and pass to lm.

mod = lm(cbind(income, happiness, health) ~ sex + faculty, ds)

And I think broom::tidy works

library(broom)
tidy(mod)

#    response        term      estimate  std.error  statistic      p.value
# 1    income (Intercept) 1019.35703873 31.0922529 32.7849205 2.779199e-54
# 2    income         sex  -54.40337314 40.1399258 -1.3553431 1.784559e-01
# 3    income     faculty   19.74808081 17.9511206  1.1001030 2.740100e-01
# 4 happiness (Intercept)    5.97334562  0.1675340 35.6545278 1.505026e-57
# 5 happiness         sex    0.05345555  0.2162855  0.2471528 8.053124e-01
# 6 happiness     faculty   -0.02525431  0.0967258 -0.2610918 7.945753e-01
# 7    health (Intercept)   19.76489553  0.5412676 36.5159396 1.741411e-58
# 8    health         sex    0.32399380  0.6987735  0.4636607 6.439296e-01
# 9    health     faculty    0.10808545  0.3125010  0.3458723 7.301877e-01
like image 192
user20650 Avatar answered Oct 18 '22 15:10

user20650


Another method is to gather the dependent variables and use a grouped data frame to fit the models with do. This is the method explained in the broom and dplyr vignette.

library(tidyverse)
library(broom)
ds <- data.frame(
  income = rnorm(100, mean = 1000, sd = 200),
  happiness = rnorm(100, mean = 6, sd = 1),
  health = rnorm(100, mean = 20, sd = 3),
  sex = c(0, 1),
  faculty = c(0, 1, 2, 3)
)
ds %>%
  gather(dv_name, dv_value, income:health) %>%
  group_by(dv_name) %>%
  do(tidy(lm(dv_value ~ sex + faculty, data = .)))
#> # A tibble: 9 x 6
#> # Groups:   dv_name [3]
#>   dv_name   term         estimate std.error statistic  p.value
#>   <chr>     <chr>           <dbl>     <dbl>     <dbl>    <dbl>
#> 1 happiness (Intercept)     6.25      0.191    32.7   3.14e-54
#> 2 happiness sex             0.163     0.246     0.663 5.09e- 1
#> 3 happiness faculty        -0.172     0.110    -1.56  1.23e- 1
#> 4 health    (Intercept)    20.1       0.524    38.4   1.95e-60
#> 5 health    sex             0.616     0.677     0.909 3.65e- 1
#> 6 health    faculty        -0.653     0.303    -2.16  3.36e- 2
#> 7 income    (Intercept)  1085.       32.8      33.0   1.43e-54
#> 8 income    sex           -12.9      42.4      -0.304 7.62e- 1
#> 9 income    faculty       -25.1      19.0      -1.32  1.89e- 1

Created on 2018-08-01 by the reprex package (v0.2.0).

like image 34
Calum You Avatar answered Oct 18 '22 15:10

Calum You


We can loop through the column names that are dependent variables, use paste to create the formula to be passed into lm and get the summary statistics with tidy (from broom)

library(tidyverse)
library(broom)
map(names(ds)[1:3], ~ 
            lm(formula(paste0(.x, "~", 
                paste(names(ds)[4:5], collapse=" + "))), data = ds) %>%
               tidy)

If we want it in a single data.frame with a column identifier for dependent variable,

map_df(set_names(names(ds)[1:3]), ~ 
      lm(formula(paste0(.x, "~", 
        paste(names(ds)[4:5], collapse=" + "))), data = ds) %>%
   tidy, .id = "Dep_Variable")
like image 35
akrun Avatar answered Oct 18 '22 17:10

akrun