I'm using tidyr::nest()
in combination with purrr::map()
(-family) to group a data.frame
into groups and then do some fancy stuff with each subset. Consider following example, and please ignore the fact that I don't need nest()
and map()
to do this (this is an oversimplified example):
library(dplyr)
library(purrr)
library(tidyr)
mtcars %>%
group_by(cyl) %>%
nest() %>%
mutate(
wt_mean = map_dbl(data,~mean(.x$wt))
)
# A tibble: 8 x 4
cyl gear data cly2
<dbl> <dbl> <list> <dbl>
1 6 4 <tibble [4 x 9]> 6
2 4 4 <tibble [8 x 9]> 4
3 6 3 <tibble [2 x 9]> 6
4 8 3 <tibble [12 x 9]> 8
5 4 3 <tibble [1 x 9]> 4
6 4 5 <tibble [2 x 9]> 4
7 8 5 <tibble [2 x 9]> 8
8 6 5 <tibble [1 x 9]> 6
Usually when I do this type of operation, I need access to the grouping variable (cyl
in this case) within map()
. But these grouping variables appear as vectors with length corresponding to the number of rows in the nested dataframe, and therefore don't lend themselves easily.
Is there a way I could run the following operation? I would want the mean of wt
to be divided by the number of cylinders (cyl
) per group (i.e. row).
mtcars %>%
group_by(cyl,gear) %>%
nest() %>%
mutate(
wt_mean = map_dbl(data,~mean(.x$wt)/cyl)
)
Error in mutate_impl(.data, dots) :
Evaluation error: Result 1 is not a length 1 atomic vector.
Take cyl
out of the map
call:
mtcars %>%
group_by(cyl,gear) %>%
nest() %>%
mutate(
wt_mean = map_dbl(data, ~mean(.x$wt)) / cyl
)
# A tibble: 8 x 4
cyl gear data wt_mean
<dbl> <dbl> <list> <dbl>
1 6 4 <tibble [4 x 9]> 0.516
2 4 4 <tibble [8 x 9]> 0.595
3 6 3 <tibble [2 x 9]> 0.556
4 8 3 <tibble [12 x 9]> 0.513
5 4 3 <tibble [1 x 9]> 0.616
6 4 5 <tibble [2 x 9]> 0.457
7 8 5 <tibble [2 x 9]> 0.421
8 6 5 <tibble [1 x 9]> 0.462
map_dbl
sees cyl
as a length 8 vector because nest
removes groups from data.frame
. Using cyl
in map_*
function call (as in OP's example) results in 8 length-8 vectors.
Both with same result as above, but keep the grouped variables in the map_*
call, per OP's specs:
nest
mtcars %>%
group_by(cyl,gear) %>%
nest() %>%
group_by(cyl, gear) %>%
mutate(wt_mean = map_dbl(data,~mean(.x$wt)/cyl))
map2
for iterating over cyl
mtcars %>%
group_by(cyl,gear) %>%
nest() %>%
mutate(wt_mean = map2_dbl(data, cyl,~mean(.x$wt)/ .y))
In the new release of dplyr
0-8-0, you can now use group_map
, which I find very handy for this use case. This is the example by github user @yutannihilation
library(dplyr, warn.conflicts = FALSE)
mtcars %>%
group_by(cyl) %>%
group_map(function(data, group_info) {
tibble::tibble(wt_mean = mean(data$wt) / group_info$cyl)
})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With