I've noticed a lot of examples here which uses dplyr::mutate
in combination with a function returning multiple outputs to create multiple columns. For example:
tmp <- mtcars %>%
group_by(cyl) %>%
summarise(min = summary(mpg)[1],
median = summary(mpg)[3],
mean = summary(mpg)[4],
max = summary(mpg)[6])
Such syntax however means that the summary
function is called 4 times, in this example, which does not seem particularly efficient. What ways are there to efficiently assign a list output to a list of column names in summarise
or mutate
?
For example, from a previous question: Split a data frame column containing a list into multiple columns using dplyr (or otherwise), I know that you can assign the output of summary
as a list and then split it using do(data.frame(...))
, however this means that you have to then add the column names later and the syntax is not as pretty.
This can also be accomplished using tidyr::nest
and purrr::map
. Note, the output returned by summary needs to be converted from a named vector to a data.frame or tibble, I'm using dplyr::bind_rows
below to accomplish this but equally data.frame(as.list(summary(.$mpg)))
could be used instead.
suppressWarnings(library(tidyverse))
mtcars %>%
group_by(cyl) %>%
nest() %>%
summarise(stats = map(data, ~ bind_rows(summary(.$mpg)))) %>%
unnest(stats)
#> # A tibble: 3 x 7
#> cyl Min. `1st Qu.` Median Mean `3rd Qu.` Max.
#> <dbl> <table> <table> <table> <table> <table> <table>
#> 1 4 21.4 22.80 26.0 26.66364 30.40 33.9
#> 2 6 17.8 18.65 19.7 19.74286 21.00 21.4
#> 3 8 10.4 14.40 15.2 15.10000 16.25 19.2
Created on 2021-04-19 by the reprex package (v0.3.0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With