I'm looking for a Tidyverse / broom solution that can solve this puzzle: Let's say I have different DVs and a specific set of IVS and I want to perform a regression that considers every DV and this specific set of IVs. I know I can use something like for i in or apply family, but I really want to run that using tidyverse. The following code works as an example <pre class="prettyprint"><code>ds <- data.frame(income = rnorm(100, mean=1000,sd=200), happiness = rnorm(100, mean = 6, sd=1), health = rnorm(100, mean=20, sd = 3), sex = c(0,1), faculty = c(0,1,2,3)) mod1 <- lm(income ~ sex + faculty, ds) mod2 <- lm(happiness ~ sex + faculty, ds) mod3 <- lm(health ~ sex + faculty, ds) summary(mod1) summary(mod2) summary(mod3) </code></pre> Income, happiness, and health are DVs. Sex and Faculty are IVs and they will be used for all regressions. That was the closest I found Let me know If I need to clarify my question. Thanks.

Another method is to <code>gather</code> the dependent variables and use a grouped data frame to fit the models with <code>do</code>. This is the method explained in the broom and dplyr vignette. <pre class="prettyprint lang-r prettyprint-override"><code>library(tidyverse) library(broom) ds <- data.frame( income = rnorm(100, mean = 1000, sd = 200), happiness = rnorm(100, mean = 6, sd = 1), health = rnorm(100, mean = 20, sd = 3), sex = c(0, 1), faculty = c(0, 1, 2, 3) ) ds %>% gather(dv_name, dv_value, income:health) %>% group_by(dv_name) %>% do(tidy(lm(dv_value ~ sex + faculty, data = .))) #> # A tibble: 9 x 6 #> # Groups: dv_name [3] #> dv_name term estimate std.error statistic p.value #> <chr> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 happiness (Intercept) 6.25 0.191 32.7 3.14e-54 #> 2 happiness sex 0.163 0.246 0.663 5.09e- 1 #> 3 happiness faculty -0.172 0.110 -1.56 1.23e- 1 #> 4 health (Intercept) 20.1 0.524 38.4 1.95e-60 #> 5 health sex 0.616 0.677 0.909 3.65e- 1 #> 6 health faculty -0.653 0.303 -2.16 3.36e- 2 #> 7 income (Intercept) 1085. 32.8 33.0 1.43e-54 #> 8 income sex -12.9 42.4 -0.304 7.62e- 1 #> 9 income faculty -25.1 19.0 -1.32 1.89e- 1 </code></pre> Created on 2018-08-01 by the reprex package (v0.2.0).

Use broom and tidyverse to run regressions on different dependent variables

Tags:

loops

r

regression

tidyverse

broom

I'm looking for a Tidyverse / broom solution that can solve this puzzle:

Let's say I have different DVs and a specific set of IVS and I want to perform a regression that considers every DV and this specific set of IVs. I know I can use something like for i in or apply family, but I really want to run that using tidyverse.

The following code works as an example

ds <- data.frame(income = rnorm(100, mean=1000,sd=200),
                 happiness = rnorm(100, mean = 6, sd=1),
                 health = rnorm(100, mean=20, sd = 3),
                 sex = c(0,1),
                 faculty = c(0,1,2,3))

mod1 <- lm(income ~ sex + faculty, ds)
mod2 <- lm(happiness ~ sex + faculty, ds)
mod3 <- lm(health ~ sex + faculty, ds)
summary(mod1)
summary(mod2)
summary(mod3)

Income, happiness, and health are DVs. Sex and Faculty are IVs and they will be used for all regressions.

That was the closest I found

Let me know If I need to clarify my question. Thanks.

690

asked Aug 01 '18 20:08

Luis

Video Answer

3 Answers

As you have different dependent variables but the same independent, you can form a matrix of these and pass to lm.

mod = lm(cbind(income, happiness, health) ~ sex + faculty, ds)

And I think broom::tidy works

library(broom)
tidy(mod)

#    response        term      estimate  std.error  statistic      p.value
# 1    income (Intercept) 1019.35703873 31.0922529 32.7849205 2.779199e-54
# 2    income         sex  -54.40337314 40.1399258 -1.3553431 1.784559e-01
# 3    income     faculty   19.74808081 17.9511206  1.1001030 2.740100e-01
# 4 happiness (Intercept)    5.97334562  0.1675340 35.6545278 1.505026e-57
# 5 happiness         sex    0.05345555  0.2162855  0.2471528 8.053124e-01
# 6 happiness     faculty   -0.02525431  0.0967258 -0.2610918 7.945753e-01
# 7    health (Intercept)   19.76489553  0.5412676 36.5159396 1.741411e-58
# 8    health         sex    0.32399380  0.6987735  0.4636607 6.439296e-01
# 9    health     faculty    0.10808545  0.3125010  0.3458723 7.301877e-01

192

answered Oct 18 '22 15:10

user20650

Another method is to gather the dependent variables and use a grouped data frame to fit the models with do. This is the method explained in the broom and dplyr vignette.

library(tidyverse)
library(broom)
ds <- data.frame(
  income = rnorm(100, mean = 1000, sd = 200),
  happiness = rnorm(100, mean = 6, sd = 1),
  health = rnorm(100, mean = 20, sd = 3),
  sex = c(0, 1),
  faculty = c(0, 1, 2, 3)
)
ds %>%
  gather(dv_name, dv_value, income:health) %>%
  group_by(dv_name) %>%
  do(tidy(lm(dv_value ~ sex + faculty, data = .)))
#> # A tibble: 9 x 6
#> # Groups:   dv_name [3]
#>   dv_name   term         estimate std.error statistic  p.value
#>   <chr>     <chr>           <dbl>     <dbl>     <dbl>    <dbl>
#> 1 happiness (Intercept)     6.25      0.191    32.7   3.14e-54
#> 2 happiness sex             0.163     0.246     0.663 5.09e- 1
#> 3 happiness faculty        -0.172     0.110    -1.56  1.23e- 1
#> 4 health    (Intercept)    20.1       0.524    38.4   1.95e-60
#> 5 health    sex             0.616     0.677     0.909 3.65e- 1
#> 6 health    faculty        -0.653     0.303    -2.16  3.36e- 2
#> 7 income    (Intercept)  1085.       32.8      33.0   1.43e-54
#> 8 income    sex           -12.9      42.4      -0.304 7.62e- 1
#> 9 income    faculty       -25.1      19.0      -1.32  1.89e- 1

Created on 2018-08-01 by the reprex package (v0.2.0).

answered Oct 18 '22 15:10

Calum You

We can loop through the column names that are dependent variables, use paste to create the formula to be passed into lm and get the summary statistics with tidy (from broom)

library(tidyverse)
library(broom)
map(names(ds)[1:3], ~ 
            lm(formula(paste0(.x, "~", 
                paste(names(ds)[4:5], collapse=" + "))), data = ds) %>%
               tidy)

If we want it in a single data.frame with a column identifier for dependent variable,

map_df(set_names(names(ds)[1:3]), ~ 
      lm(formula(paste0(.x, "~", 
        paste(names(ds)[4:5], collapse=" + "))), data = ds) %>%
   tidy, .id = "Dep_Variable")

answered Oct 18 '22 17:10

akrun

Related questions
                            
                                Extract components from mixed model (lme4) formula
                            
                                How to apply function returning data.frames with factors to sequence
                            
                                Find and replace missing values with row mean
                            
                                Using shapiro.test on multiple columns in a data frame
                            
                                Proximity Matrix - Random Forest , R
                            
                                How is data passed from reactive Shiny expression to ggvis plot?
                            
                                R: apply list of functions on list of argument by name
                            
                                match two data.frames based on multiple columns
                            
                                Adding 15 business days in lubridate
                            
                                Assign color to 2 different geoms and get 2 different legends
                            
                                Fill NA values with the trailing row value times a growth rate?
                            
                                R - Set execution time limit in loop
                            
                                Convert negative values to zero in dataframe in R
                            
                                Caret package findCorrelation() function
                            
                                Replace values in list
                            
                                Find string in data.frame
                            
                                ggplot2 remove axis label
                            
                                updateSelectinput throws a session not found error
                            
                                Visualizing two or more data points where they overlap (ggplot R)
                            
                                Dividing each cell in a data set by the column sum in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With