Accessing grouping variables in purrr::map() with nested dataframes

Question

I'm using tidyr::nest() in combination with purrr::map() (-family) to group a data.frame into groups and then do some fancy stuff with each subset. Consider following example, and please ignore the fact that I don't need nest() and map() to do this (this is an oversimplified example):

library(dplyr)
library(purrr)
library(tidyr)

mtcars %>% 
  group_by(cyl) %>%
  nest() %>%
  mutate(
    wt_mean = map_dbl(data,~mean(.x$wt))
  )

# A tibble: 8 x 4
    cyl  gear data               cly2
  <dbl> <dbl> <list>            <dbl>
1     6     4 <tibble [4 x 9]>      6
2     4     4 <tibble [8 x 9]>      4
3     6     3 <tibble [2 x 9]>      6
4     8     3 <tibble [12 x 9]>     8
5     4     3 <tibble [1 x 9]>      4
6     4     5 <tibble [2 x 9]>      4
7     8     5 <tibble [2 x 9]>      8
8     6     5 <tibble [1 x 9]>      6

Usually when I do this type of operation, I need access to the grouping variable (cyl in this case) within map(). But these grouping variables appear as vectors with length corresponding to the number of rows in the nested dataframe, and therefore don't lend themselves easily.

Is there a way I could run the following operation? I would want the mean of wt to be divided by the number of cylinders (cyl) per group (i.e. row).

mtcars %>% 
  group_by(cyl,gear) %>%
  nest() %>%
  mutate(
    wt_mean = map_dbl(data,~mean(.x$wt)/cyl)
  )


Error in mutate_impl(.data, dots) : 
  Evaluation error: Result 1 is not a length 1 atomic vector.

zack · Accepted Answer

Take cyl out of the map call:

mtcars %>% 
  group_by(cyl,gear) %>%
  nest() %>%
  mutate(
    wt_mean = map_dbl(data, ~mean(.x$wt)) / cyl
  )

# A tibble: 8 x 4
    cyl  gear data              wt_mean
  <dbl> <dbl> <list>              <dbl>
1     6     4 <tibble [4 x 9]>    0.516
2     4     4 <tibble [8 x 9]>    0.595
3     6     3 <tibble [2 x 9]>    0.556
4     8     3 <tibble [12 x 9]>   0.513
5     4     3 <tibble [1 x 9]>    0.616
6     4     5 <tibble [2 x 9]>    0.457
7     8     5 <tibble [2 x 9]>    0.421
8     6     5 <tibble [1 x 9]>    0.462

map_dbl sees cyl as a length 8 vector because nest removes groups from data.frame. Using cyl in map_* function call (as in OP's example) results in 8 length-8 vectors.

2 other approaches:

Both with same result as above, but keep the grouped variables in the map_* call, per OP's specs:

Re grouping after `nest`

mtcars %>% 
  group_by(cyl,gear) %>%
  nest() %>%
  group_by(cyl, gear) %>%
  mutate(wt_mean = map_dbl(data,~mean(.x$wt)/cyl))

`map2` for iterating over `cyl`

mtcars %>% 
  group_by(cyl,gear) %>%
  nest() %>%
  mutate(wt_mean = map2_dbl(data, cyl,~mean(.x$wt)/ .y))

Ratnanil · Answer

In the new release of dplyr 0-8-0, you can now use group_map, which I find very handy for this use case. This is the example by github user @yutannihilation

library(dplyr, warn.conflicts = FALSE)

mtcars %>% 
  group_by(cyl) %>%
  group_map(function(data, group_info) {
    tibble::tibble(wt_mean = mean(data$wt) / group_info$cyl)
  })

Accessing grouping variables in purrr::map() with nested dataframes

Tags:

r

dplyr

purrr

tidyr

Ratnanil

2 Answers

2 other approaches:

Re grouping after `nest`

`map2` for iterating over `cyl`

zack

Ratnanil

Recent Activity

Donate For Us

Accessing grouping variables in purrr::map() with nested dataframes

Tags:

r

dplyr

purrr

tidyr

Ratnanil

2 Answers

2 other approaches:

Re grouping after nest

map2 for iterating over cyl

zack

Ratnanil

Related questions

Recent Activity

Donate For Us

Re grouping after `nest`

`map2` for iterating over `cyl`