Efficient assignment of a function with multiple outputs in dplyr mutate or summarise




I've noticed a lot of examples here which uses dplyr::mutate in combination with a function returning multiple outputs to create multiple columns. For example:

tmp <- mtcars %>%
    group_by(cyl) %>%
    summarise(min = summary(mpg)[1],
              median = summary(mpg)[3],
              mean = summary(mpg)[4],
              max = summary(mpg)[6])

Such syntax however means that the summary function is called 4 times, in this example, which does not seem particularly efficient. What ways are there to efficiently assign a list output to a list of column names in summarise or mutate?

For example, from a previous question: Split a data frame column containing a list into multiple columns using dplyr (or otherwise), I know that you can assign the output of summary as a list and then split it using do(data.frame(...)), however this means that you have to then add the column names later and the syntax is not as pretty.

1 Answers

This can also be accomplished using tidyr::nest and purrr::map. Note, the output returned by summary needs to be converted from a named vector to a data.frame or tibble, I'm using dplyr::bind_rows below to accomplish this but equally data.frame(as.list(summary(.$mpg))) could be used instead.


mtcars %>%
  group_by(cyl) %>%
  nest() %>% 
  summarise(stats = map(data, ~ bind_rows(summary(.$mpg)))) %>% 
#> # A tibble: 3 x 7
#>     cyl Min.    `1st Qu.` Median  Mean     `3rd Qu.` Max.   
#>   <dbl> <table> <table>   <table> <table>  <table>   <table>
#> 1     4 21.4    22.80     26.0    26.66364 30.40     33.9   
#> 2     6 17.8    18.65     19.7    19.74286 21.00     21.4   
#> 3     8 10.4    14.40     15.2    15.10000 16.25     19.2

Created on 2021-04-19 by the reprex package (v0.3.0)

