Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using mutate(across(...)) with purrr::map

Tags:

r

dplyr

purrr

I'm having trouble figuring out how to use purrr::map() with mutate(across(...)).

I want to do a linear model and pull out the estimate for the slope of multiple columns as predicted by a single column.

Here is what I'm attempting with an example data set:

mtcars %>%
  mutate(across(-mpg), 
    map(.x, lst(slope = ~lm(.x ~ mpg, data = .x) %>% 
          tidy() %>% 
          filter(term != "(Intercept") %>% 
            pull(estimate)
  )))

The output I'm looking for would be new columns for each non-mpg column with _slope appended to the name, ie cyl_slope

In my actual data, I'll be grouping by another variable as well in case that matters, as I need the slope for each group for each predicted variable. I have this working in a standard mutate doing one variable at a time as follows:

df %>% 
  group_by(unitid) %>% 
  nest() %>% 

  mutate(tuition_and_fees_as_pct_total_rev_slope = map_dbl(data, ~lm(tuition_and_fees_as_pct_total_rev ~ year, data = .x) %>%
               tidy() %>%
                 filter(term == "year") %>%
                 pull(estimate)
               ))

So:

  1. I think my issue is how to pass the column name being predicted into the lm
  2. I don't know if the solution requires nesting or not, so it would be appreciated if in the mtcars example that is considered.
like image 362
jzadra Avatar asked Apr 28 '26 11:04

jzadra


2 Answers

If we wanted to do lm on all other columns with independent variable as 'mpg', one option is to loop over the column names of the 'mtcars' except the 'mpg', create the formula with reformulate, apply the lm, convert to a tidy format, filter out the 'Intercept' and select the 'estimate' column

library(dplyr)
library(tidyr)
library(broom)
map_dfc(setdiff(names(mtcars), 'mpg'), ~ 
   lm(reformulate('mpg', response = .x), data = mtcars) %>%
     tidy %>% 
     filter(term != "(Intercept)") %>%
     select(estimate))

-output

# A tibble: 1 x 10
#   estimate...1 estimate...2 estimate...3 estimate...4 estimate...5 estimate...6 estimate...7 estimate...8 estimate...9 estimate...10
#      <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>         <dbl>
#1       -0.253        -17.4        -8.83       0.0604       -0.141        0.124       0.0555       0.0497       0.0588        -0.148

Or this can be done more easily with a matrix as dependent

library(stringr)
lm(as.matrix(mtcars[setdiff(names(mtcars), "mpg")]) ~ mpg, 
       data = mtcars) %>% 
    tidy %>% 
    filter(term != "(Intercept)") %>%
    select(response, estimate) %>%
    mutate(response = str_c(response, '_slope'))

-output

# A tibble: 10 x 2
#   response   estimate
#   <chr>         <dbl>
# 1 cyl_slope   -0.253 
# 2 disp_slope -17.4   
# 3 hp_slope    -8.83  
# 4 drat_slope   0.0604
# 5 wt_slope    -0.141 
# 6 qsec_slope   0.124 
# 7 vs_slope     0.0555
# 8 am_slope     0.0497
# 9 gear_slope   0.0588
#10 carb_slope  -0.148 

Or another option is summarise with across

mtcars %>%
     summarise(across(-mpg, ~ list(lm(reformulate('mpg', 
              response = cur_column())) %>%
                   tidy %>%
                   filter(term != "(Intercept)") %>%
                   pull(estimate)), .names = "{.col}_slope")) %>%
     unnest(everything())
# A tibble: 1 x 10
#  cyl_slope disp_slope hp_slope drat_slope wt_slope qsec_slope vs_slope am_slope gear_slope carb_slope
#      <dbl>      <dbl>    <dbl>      <dbl>    <dbl>      <dbl>    <dbl>    <dbl>      <dbl>      <dbl>
#1    -0.253      -17.4    -8.83     0.0604   -0.141      0.124   0.0555   0.0497     0.0588     -0.148
like image 79
akrun Avatar answered May 01 '26 00:05

akrun


One option could be:

map_dfr(.x = names(select(mtcars, -c(mpg, vs))),
        ~ mtcars %>%
         group_by(vs) %>%
         nest() %>%
         mutate(variable = .x,
                estimate = map_dbl(data, function(y) lm(!!sym(.x) ~ mpg, data = y) %>% 
                                    tidy() %>%
                                    filter(term != "(Intercept)") %>%
                                    pull(estimate))) %>%
         select(-data))

     vs variable estimate
   <dbl> <chr>       <dbl>
 1     0 cyl       -0.242 
 2     1 cyl       -0.116 
 3     0 disp     -22.5   
 4     1 disp      -8.01  
 5     0 hp       -10.1   
 6     1 hp        -3.26  
 7     0 drat       0.0748
 8     1 drat       0.0529
 9     0 wt        -0.192 
10     1 wt        -0.113 
11     0 qsec      -0.0357
12     1 qsec      -0.0432
13     0 am         0.0742
14     1 am         0.0710
15     0 gear       0.114 
16     1 gear       0.0492
17     0 carb      -0.0883
18     1 carb      -0.0790
like image 42
tmfmnk Avatar answered May 01 '26 00:05

tmfmnk



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!