dplyr-0.6.0 programming unquoting

Tags:

I'm trying to write a simple wrapper to summarise() arbitrary variables by arbitrary groups and have made progress now I've got the correct library version loaded but am confused (again) about how to unquote arguments with multiple values.

I currently have the following function...

table_summary <- function(df     = .,
                          id     = individual_id,
                          select = c(),
                          group  = site,
                          ...){
    ## Quote all arguments (see http://dplyr.tidyverse.org/articles/programming.html)
    quo_id     <- enquo(id)
    quo_select <- enquo(select)
    quo_group  <- enquo(group)
    ## Subset the data
    df <- df %>%
          dplyr::select(!!quo_id, !!quo_select, !!quo_group) %>%
          unique()
    ## gather() data, just in case there is > 1 variable selected to be summarised
    df <- df %>%
          gather(key = variable, value = value, !!quo_select)
    ## Summarise selected variables by specified groups
    results <- df %>%
           group_by(!!quo_group, variable) %>%
           summarise(n    = n(),
                     mean = mean(value, na.rm = TRUE))
    return(results)
}

Which gets most of the way there and works if I specify one grouping variable...

> table_summary(df = mtcars, id = model, select = c(mpg), group = gear)
# A tibble: 3 x 4
# Groups:   c(gear) [?]
       gear variable     n     mean
      <dbl>    <chr> <int>    <dbl>
1         3      mpg    15 16.10667
2         4      mpg    12 24.53333
3         5      mpg     5 21.38000

...but fails at the group_by(!!quo_group, variable) when I specify more than one group = c(gear, hp)...

> mtcars$model <- rownames(mtcars)
> table_summary(df = mtcars, id = model, select = c(mpg), group = c(gear, hp))
Error in mutate_impl(.data, dots) : 
  Column `c(gear, hp)` must be length 32 (the group size) or one, not 64

I went back and re-read the programming dplyr documentation and I read that you can capture multiple variables using quos() instead of enquo() and then unquote-splice them with !!!, so tried...

table_summary <- function(df     = .,
                          id     = individual_id,
                          select = c(),
                          group  = c(),
                          digits = 3,
                          ...){
    ## Quote all arguments (see http://dplyr.tidyverse.org/articles/programming.html)
    quo_id     <- enquo(id)
    quo_select <- enquo(select)
    quo_group  <- quos(group)  ## Use quos() rather than enquo()
    UQS(quo_group) %>% print() ## Check to see what quo_group holds
    ## Subset the data
    df <- df %>%
          dplyr::select(!!quo_id, !!quo_select, !!!quo_group)) %>%
          unique()
    ## gather() data, just in case there is > 1 variable selected to be summarised
    df <- df %>%
          gather(key = variable, value = value, !!quo_select)
    ## Summarise selected variables by specified groups
    results <- df %>%
               group_by(!!!quo_group, variable) %>%
               summarise(n    = n(),
                         mean = mean(value, na.rm = TRUE))
    return(results)
}

...which now fails at the first reference to !!!quo_group``withindplyr::select()regardless of how many variables are specified undergroup = `...

> table_summary(df = mtcars, id = model, select = c(mpg), group = c(gear))
[[1]]
<quosure: frame>
~group

attr(,"class")
[1] "quosures"
Error in overscope_eval_next(overscope, expr) : object 'gear' not found
> traceback()
17: .Call(rlang_eval, f_rhs(quo), overscope)
16: overscope_eval_next(overscope, expr)
15: FUN(X[[i]], ...)
14: lapply(.x, .f, ...)
13: map(.x[matches], .f, ...)
12: map_if(ind_list, !is_helper, eval_tidy, data = names_list)
11: select_vars(names(.data), !(!(!quos(...))))
10: select.data.frame(., !(!quo_id), !(!quo_select), !(!(!quo_group)))
9: dplyr::select(., !(!quo_id), !(!quo_select), !(!(!quo_group)))
8: function_list[[i]](value)
7: freduce(value, `_function_list`)
6: `_fseq`(`_lhs`)
5: eval(quote(`_fseq`(`_lhs`)), env, env)
4: eval(quote(`_fseq`(`_lhs`)), env, env)
3: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
2: df %>% dplyr::select(!(!quo_id), !(!quo_select), !(!(!quo_group))) %>% 
       unique()
1: table_summary(df = mtcars, id = model, select = c(mpg), group = c(gear))

What seems strange and I think is the source of the problem is that !!!quo_group (i.e. UQS(quo_group)) prints out ~gear rather than a list of quosures as adding a print() into the worked examples shows happens...

> my_summarise <- function(df, ...) {
    group_by <- quos(...)
    UQS(group_by) %>% print()
    df %>%
    group_by(!!!group_by) %>%
    summarise(a = mean(a))
  }
> df <- tibble(
    g1 = c(1, 1, 2, 2, 2),
    g2 = c(1, 2, 1, 2, 1),
    a = sample(5), 
    b = sample(5)
  )
> my_summarise(df, g1, g2)
[[1]]
<quosure: global>
~g1

[[2]]
<quosure: global>
~g2

attr(,"class")
[1] "quosures"
# A tibble: 4 x 3
# Groups:   g1 [?]
     g1    g2     a
  <dbl> <dbl> <dbl>
1     1     1   1.0
2     1     2   5.0
3     2     1   2.5
4     2     2   4.0

I'd like to explicitly supply the variables I wish to group by as a parameter to my argument but does it work if I specify them as ... but I decided to test if my function works when supplying the grouping variables as ...

table_summary <- function(df     = .,
                          id     = individual_id,
                          select = c(),
                          group  = c(),
                          digits = 3,
                          ...){
    ## Quote all arguments (see http://dplyr.tidyverse.org/articles/programming.html)
    quo_id     <- enquo(id)
    quo_select <- enquo(select)
    ## quo_group  <- quos(group)
    quo_group  <- quos(...)
    UQS(quo_group) %>% print()
    ## Subset the data
    df <- df %>%
          dplyr::select(!!quo_id, !!quo_select, !!!quo_group) %>%
          unique()
    ## gather() data, just in case there is > 1 variable selected to be summarised
    df <- df %>%
          gather(key = variable, value = value, !!quo_select)
    ## Summarise selected variables by specified groups
    results <- df %>%
               group_by(!!!quo_group, variable) %>%
               summarise(n    = n(),
                         mean = mean(value, na.rm = TRUE))
    return(results)
}

...but it doesn't, quos() again unquote-splices to NULL so the variables are neither selected nor grouped by...

> table_summary(df = mtcars, id = model, select = c(mpg), gear, hp)
NULL
# A tibble: 1 x 3
  variable     n     mean
     <chr> <int>    <dbl>
1      mpg    32 20.09062
> table_summary(df = mtcars, id = model, select = c(mpg), gear)
NULL
# A tibble: 1 x 3
  variable     n     mean
     <chr> <int>    <dbl>
1      mpg    32 20.09062

I've gone through this cycle several times now checking each method of using enquo() and quos() but can not see where I am going wrong and despite having read the programming dplyr documentation several times.

261

asked May 26 '17 13:05

slackline

1 Answers

IIUC your post, you want to supply c(col1, col2) to group_by(). This is not supported by that verb:

group_by(mtcars, c(cyl, am))
#> Error in mutate_impl(.data, dots) :
#>   Column `c(cyl, am)` must be length 32 (the number of rows) or one, not 64

That's because group_by() has mutate semantics, not select semantics. That means that the expressions you supply to group_by() are transformative expressions. This is a surprising but quite handy feature. For example you can group by disp cut into three intervals like this:

group_by(mtcars, cut3 = cut(disp, 3))

This also means that if you supply c(cyl, am), it will concatenate the two columns together and return a vector of length 64, while it was expecting a length of 32 (the number of rows).

So your problem is that you want a wrapper to group_by() that has selection semantics. This is easy to do by using dplyr::select_vars(), which will soon be extracted to the new tidyselect package:

library("dplyr")

group_wrapper <- function(df, groups = rlang::chr()) {
  groups <- select_vars(tbl_vars(df), !! enquo(groups))
  group_by(df, !!! rlang::syms(groups))
}

Alternatively you can wrap the new group_by_at() verb which does have select semantics:

group_wrapper <- function(df, groups = rlang::chr()) {
  group_by_at(df, vars(!! enquo(groups)))
}

Let's try it out:

group_wrapper(mtcars, c(disp, am))
#> # A tibble: 32 x 11
#> # Groups:   disp, am [27]
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21.0     6   160   110  3.90  2.62  16.5     0     1     4     4
#> # ... with 22 more rows

This interface has the advantage of supporting all select() operations to select the columns to group by.

Note that I'm using rlang::chr() as default argument because c() returns NULL which isn't supported by selecting functions (we may want to change that in the future). chr() called without arguments returns a character vector of length 0.

116

answered Sep 24 '22 09:09

Lionel Henry

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

dplyr-0.6.0 programming unquoting

Tags:

slackline

People also ask

1 Answers

Lionel Henry

Recent Activity

Donate For Us

dplyr-0.6.0 programming unquoting

Tags:

slackline

People also ask

1 Answers

Lionel Henry

Related questions

Recent Activity

Donate For Us