Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr-0.6.0 programming unquoting

Tags:

I'm trying to write a simple wrapper to summarise() arbitrary variables by arbitrary groups and have made progress now I've got the correct library version loaded but am confused (again) about how to unquote arguments with multiple values.

I currently have the following function...

table_summary <- function(df     = .,
                          id     = individual_id,
                          select = c(),
                          group  = site,
                          ...){
    ## Quote all arguments (see http://dplyr.tidyverse.org/articles/programming.html)
    quo_id     <- enquo(id)
    quo_select <- enquo(select)
    quo_group  <- enquo(group)
    ## Subset the data
    df <- df %>%
          dplyr::select(!!quo_id, !!quo_select, !!quo_group) %>%
          unique()
    ## gather() data, just in case there is > 1 variable selected to be summarised
    df <- df %>%
          gather(key = variable, value = value, !!quo_select)
    ## Summarise selected variables by specified groups
    results <- df %>%
           group_by(!!quo_group, variable) %>%
           summarise(n    = n(),
                     mean = mean(value, na.rm = TRUE))
    return(results)
}

Which gets most of the way there and works if I specify one grouping variable...

> table_summary(df = mtcars, id = model, select = c(mpg), group = gear)
# A tibble: 3 x 4
# Groups:   c(gear) [?]
       gear variable     n     mean
      <dbl>    <chr> <int>    <dbl>
1         3      mpg    15 16.10667
2         4      mpg    12 24.53333
3         5      mpg     5 21.38000

...but fails at the group_by(!!quo_group, variable) when I specify more than one group = c(gear, hp)...

> mtcars$model <- rownames(mtcars)
> table_summary(df = mtcars, id = model, select = c(mpg), group = c(gear, hp))
Error in mutate_impl(.data, dots) : 
  Column `c(gear, hp)` must be length 32 (the group size) or one, not 64

I went back and re-read the programming dplyr documentation and I read that you can capture multiple variables using quos() instead of enquo() and then unquote-splice them with !!!, so tried...

table_summary <- function(df     = .,
                          id     = individual_id,
                          select = c(),
                          group  = c(),
                          digits = 3,
                          ...){
    ## Quote all arguments (see http://dplyr.tidyverse.org/articles/programming.html)
    quo_id     <- enquo(id)
    quo_select <- enquo(select)
    quo_group  <- quos(group)  ## Use quos() rather than enquo()
    UQS(quo_group) %>% print() ## Check to see what quo_group holds
    ## Subset the data
    df <- df %>%
          dplyr::select(!!quo_id, !!quo_select, !!!quo_group)) %>%
          unique()
    ## gather() data, just in case there is > 1 variable selected to be summarised
    df <- df %>%
          gather(key = variable, value = value, !!quo_select)
    ## Summarise selected variables by specified groups
    results <- df %>%
               group_by(!!!quo_group, variable) %>%
               summarise(n    = n(),
                         mean = mean(value, na.rm = TRUE))
    return(results)
}

...which now fails at the first reference to !!!quo_group``withindplyr::select()regardless of how many variables are specified undergroup = `...

> table_summary(df = mtcars, id = model, select = c(mpg), group = c(gear))
[[1]]
<quosure: frame>
~group

attr(,"class")
[1] "quosures"
Error in overscope_eval_next(overscope, expr) : object 'gear' not found
> traceback()
17: .Call(rlang_eval, f_rhs(quo), overscope)
16: overscope_eval_next(overscope, expr)
15: FUN(X[[i]], ...)
14: lapply(.x, .f, ...)
13: map(.x[matches], .f, ...)
12: map_if(ind_list, !is_helper, eval_tidy, data = names_list)
11: select_vars(names(.data), !(!(!quos(...))))
10: select.data.frame(., !(!quo_id), !(!quo_select), !(!(!quo_group)))
9: dplyr::select(., !(!quo_id), !(!quo_select), !(!(!quo_group)))
8: function_list[[i]](value)
7: freduce(value, `_function_list`)
6: `_fseq`(`_lhs`)
5: eval(quote(`_fseq`(`_lhs`)), env, env)
4: eval(quote(`_fseq`(`_lhs`)), env, env)
3: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
2: df %>% dplyr::select(!(!quo_id), !(!quo_select), !(!(!quo_group))) %>% 
       unique()
1: table_summary(df = mtcars, id = model, select = c(mpg), group = c(gear))

What seems strange and I think is the source of the problem is that !!!quo_group (i.e. UQS(quo_group)) prints out ~gear rather than a list of quosures as adding a print() into the worked examples shows happens...

> my_summarise <- function(df, ...) {
    group_by <- quos(...)
    UQS(group_by) %>% print()
    df %>%
    group_by(!!!group_by) %>%
    summarise(a = mean(a))
  }
> df <- tibble(
    g1 = c(1, 1, 2, 2, 2),
    g2 = c(1, 2, 1, 2, 1),
    a = sample(5), 
    b = sample(5)
  )
> my_summarise(df, g1, g2)
[[1]]
<quosure: global>
~g1

[[2]]
<quosure: global>
~g2

attr(,"class")
[1] "quosures"
# A tibble: 4 x 3
# Groups:   g1 [?]
     g1    g2     a
  <dbl> <dbl> <dbl>
1     1     1   1.0
2     1     2   5.0
3     2     1   2.5
4     2     2   4.0

I'd like to explicitly supply the variables I wish to group by as a parameter to my argument but does it work if I specify them as ... but I decided to test if my function works when supplying the grouping variables as ...

table_summary <- function(df     = .,
                          id     = individual_id,
                          select = c(),
                          group  = c(),
                          digits = 3,
                          ...){
    ## Quote all arguments (see http://dplyr.tidyverse.org/articles/programming.html)
    quo_id     <- enquo(id)
    quo_select <- enquo(select)
    ## quo_group  <- quos(group)
    quo_group  <- quos(...)
    UQS(quo_group) %>% print()
    ## Subset the data
    df <- df %>%
          dplyr::select(!!quo_id, !!quo_select, !!!quo_group) %>%
          unique()
    ## gather() data, just in case there is > 1 variable selected to be summarised
    df <- df %>%
          gather(key = variable, value = value, !!quo_select)
    ## Summarise selected variables by specified groups
    results <- df %>%
               group_by(!!!quo_group, variable) %>%
               summarise(n    = n(),
                         mean = mean(value, na.rm = TRUE))
    return(results)
}

...but it doesn't, quos() again unquote-splices to NULL so the variables are neither selected nor grouped by...

> table_summary(df = mtcars, id = model, select = c(mpg), gear, hp)
NULL
# A tibble: 1 x 3
  variable     n     mean
     <chr> <int>    <dbl>
1      mpg    32 20.09062
> table_summary(df = mtcars, id = model, select = c(mpg), gear)
NULL
# A tibble: 1 x 3
  variable     n     mean
     <chr> <int>    <dbl>
1      mpg    32 20.09062

I've gone through this cycle several times now checking each method of using enquo() and quos() but can not see where I am going wrong and despite having read the programming dplyr documentation several times.

like image 261
slackline Avatar asked May 26 '17 13:05

slackline


People also ask

Is Dplyr in the tidyverse?

Similarly to readr , dplyr and tidyr are also part of the tidyverse. These packages were loaded in R's memory when we called library(tidyverse) earlier.

What is tidy evaluation?

Tidy evaluation is a framework for controlling how expressions and variables in your code are evaluated by tidyverse functions. This framework, housed in the rlang package, is a powerful tool for writing more efficient and elegant code.

What is Enquo R?

enquo() takes a symbol referring to a function argument, quotes the R code that was supplied to this argument, captures the environment where the function was called (and thus where the R code was typed), and bundles them in a quosure. quos() is a bit different to other functions as it returns a list of quosures.

How many functions are there in Dplyr?

These five functions provide the basis of a language of data manipulation.


1 Answers

IIUC your post, you want to supply c(col1, col2) to group_by(). This is not supported by that verb:

group_by(mtcars, c(cyl, am))
#> Error in mutate_impl(.data, dots) :
#>   Column `c(cyl, am)` must be length 32 (the number of rows) or one, not 64

That's because group_by() has mutate semantics, not select semantics. That means that the expressions you supply to group_by() are transformative expressions. This is a surprising but quite handy feature. For example you can group by disp cut into three intervals like this:

group_by(mtcars, cut3 = cut(disp, 3))

This also means that if you supply c(cyl, am), it will concatenate the two columns together and return a vector of length 64, while it was expecting a length of 32 (the number of rows).

So your problem is that you want a wrapper to group_by() that has selection semantics. This is easy to do by using dplyr::select_vars(), which will soon be extracted to the new tidyselect package:

library("dplyr")

group_wrapper <- function(df, groups = rlang::chr()) {
  groups <- select_vars(tbl_vars(df), !! enquo(groups))
  group_by(df, !!! rlang::syms(groups))
}

Alternatively you can wrap the new group_by_at() verb which does have select semantics:

group_wrapper <- function(df, groups = rlang::chr()) {
  group_by_at(df, vars(!! enquo(groups)))
}

Let's try it out:

group_wrapper(mtcars, c(disp, am))
#> # A tibble: 32 x 11
#> # Groups:   disp, am [27]
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21.0     6   160   110  3.90  2.62  16.5     0     1     4     4
#> # ... with 22 more rows

This interface has the advantage of supporting all select() operations to select the columns to group by.

Note that I'm using rlang::chr() as default argument because c() returns NULL which isn't supported by selecting functions (we may want to change that in the future). chr() called without arguments returns a character vector of length 0.

like image 116
Lionel Henry Avatar answered Sep 24 '22 09:09

Lionel Henry