Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using dplyr summarise() for specific columns within purrr map() with grouped data

Tags:

r

dplyr

purrr

I have a problem I'm trying to solve, and I can't seem to find a succinct solution. There are a few similar questions on SO, but nothing that quite fits.

Take some sample data:

library(dplyr)

dat <- tibble(
  group1 = factor(sample(c("one", "two"), 10, replace = T)),
  group2 = factor(sample(c("alpha", "beta"), 10, replace = T)),
  var1 = rnorm(10, 20, 2),
  var2 = rnorm(10, 20, 2),
  var3 = rnorm(10, 20, 2),
  other1 = sample(c("a", "b", "c"), 10, replace = T),
  other2 = sample(c("a", "b", "c"), 10, replace = T),
)

I would like to summarise just the numeric variables (i.e. ignoring other1 and other2), but have the output grouped by group1 and group2.

I have tried something like this, but it returns an error as it attempts to apply my summarise() functions to the grouping variables too.

dat %>%
  group_by(group1, group2) %>%
  select(where(is.numeric)) %>%
  map(~ .x %>%
        filter(!is.na(.x)) %>%
        summarise(mean = mean(.x),
                  sd = sd(.x),
                  median = median(.x),
                  q1 = quantile(.x, p = .25),
                  q3 = quantile(.x, p = .75))
  )

My expected output would be something like

  group1  group2  mean    sd    median   q1     q3
  <fct> <fct>     <dbl>  <dbl>   <dbl>  <dbl>  <dbl>
1 one   alpha       ?      ?       ?      ?      ?
2 one   beta        ?      ?       ?      ?      ?
3 two   alpha       ?      ?       ?      ?      ?
4 two   beta        ?      ?       ?      ?      ?

Any solutions would be greatly appreciated.

Thanks, Sam

like image 346
Sam Avatar asked Sep 01 '25 20:09

Sam


1 Answers

Try:

dat %>% group_by(group1,group2) %>%
        summarize(across(is.numeric,c(sd = sd,
                                      mean = mean, 
                                      median =median,
                                      q1 = function(x) quantile(x,.25),
                                      q3 = function(x) quantile(x,.75))))

group1 group2 var1_sd var1_mean var1_median var1_q1 var1_q3 var2_sd var2_mean var2_median var2_q1 var2_q3 var3_sd
  <fct>  <fct>    <dbl>     <dbl>       <dbl>   <dbl>   <dbl>   <dbl>     <dbl>       <dbl>   <dbl>   <dbl>   <dbl>
1 one    alpha    4.06       20.6        19.3    18.3    22.2   1.12       17.9        17.3    17.2    18.2  1.09  
2 one    beta     0.726      18.7        18.7    18.4    18.9   0.348      18.8        18.8    18.7    18.9  0.604 
3 two    alpha    1.31       19.9        20.0    19.3    20.6   1.10       17.8        18.3    17.4    18.5  0.624 
4 two    beta     0.777      21.2        21.2    21.0    21.5   1.13       19.6        19.6    19.2    20.0  0.0161
like image 151
Waldi Avatar answered Sep 05 '25 02:09

Waldi