I'm trying to write a simple wrapper to summarise()
arbitrary variables by arbitrary groups and have made progress now I've got the correct library version loaded but am confused (again) about how to unquote arguments with multiple values.
I currently have the following function...
table_summary <- function(df = .,
id = individual_id,
select = c(),
group = site,
...){
## Quote all arguments (see http://dplyr.tidyverse.org/articles/programming.html)
quo_id <- enquo(id)
quo_select <- enquo(select)
quo_group <- enquo(group)
## Subset the data
df <- df %>%
dplyr::select(!!quo_id, !!quo_select, !!quo_group) %>%
unique()
## gather() data, just in case there is > 1 variable selected to be summarised
df <- df %>%
gather(key = variable, value = value, !!quo_select)
## Summarise selected variables by specified groups
results <- df %>%
group_by(!!quo_group, variable) %>%
summarise(n = n(),
mean = mean(value, na.rm = TRUE))
return(results)
}
Which gets most of the way there and works if I specify one grouping variable...
> table_summary(df = mtcars, id = model, select = c(mpg), group = gear)
# A tibble: 3 x 4
# Groups: c(gear) [?]
gear variable n mean
<dbl> <chr> <int> <dbl>
1 3 mpg 15 16.10667
2 4 mpg 12 24.53333
3 5 mpg 5 21.38000
...but fails at the group_by(!!quo_group, variable)
when I specify more than one group = c(gear, hp)
...
> mtcars$model <- rownames(mtcars)
> table_summary(df = mtcars, id = model, select = c(mpg), group = c(gear, hp))
Error in mutate_impl(.data, dots) :
Column `c(gear, hp)` must be length 32 (the group size) or one, not 64
I went back and re-read the programming dplyr documentation and I read that you can capture multiple variables using quos()
instead of enquo()
and then unquote-splice them with !!!
, so tried...
table_summary <- function(df = .,
id = individual_id,
select = c(),
group = c(),
digits = 3,
...){
## Quote all arguments (see http://dplyr.tidyverse.org/articles/programming.html)
quo_id <- enquo(id)
quo_select <- enquo(select)
quo_group <- quos(group) ## Use quos() rather than enquo()
UQS(quo_group) %>% print() ## Check to see what quo_group holds
## Subset the data
df <- df %>%
dplyr::select(!!quo_id, !!quo_select, !!!quo_group)) %>%
unique()
## gather() data, just in case there is > 1 variable selected to be summarised
df <- df %>%
gather(key = variable, value = value, !!quo_select)
## Summarise selected variables by specified groups
results <- df %>%
group_by(!!!quo_group, variable) %>%
summarise(n = n(),
mean = mean(value, na.rm = TRUE))
return(results)
}
...which now fails at the first reference to !!!quo_group``within
dplyr::select()regardless of how many variables are specified under
group = `...
> table_summary(df = mtcars, id = model, select = c(mpg), group = c(gear))
[[1]]
<quosure: frame>
~group
attr(,"class")
[1] "quosures"
Error in overscope_eval_next(overscope, expr) : object 'gear' not found
> traceback()
17: .Call(rlang_eval, f_rhs(quo), overscope)
16: overscope_eval_next(overscope, expr)
15: FUN(X[[i]], ...)
14: lapply(.x, .f, ...)
13: map(.x[matches], .f, ...)
12: map_if(ind_list, !is_helper, eval_tidy, data = names_list)
11: select_vars(names(.data), !(!(!quos(...))))
10: select.data.frame(., !(!quo_id), !(!quo_select), !(!(!quo_group)))
9: dplyr::select(., !(!quo_id), !(!quo_select), !(!(!quo_group)))
8: function_list[[i]](value)
7: freduce(value, `_function_list`)
6: `_fseq`(`_lhs`)
5: eval(quote(`_fseq`(`_lhs`)), env, env)
4: eval(quote(`_fseq`(`_lhs`)), env, env)
3: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
2: df %>% dplyr::select(!(!quo_id), !(!quo_select), !(!(!quo_group))) %>%
unique()
1: table_summary(df = mtcars, id = model, select = c(mpg), group = c(gear))
What seems strange and I think is the source of the problem is that !!!quo_group
(i.e. UQS(quo_group)
) prints out ~gear
rather than a list of quosures as adding a print()
into the worked examples shows happens...
> my_summarise <- function(df, ...) {
group_by <- quos(...)
UQS(group_by) %>% print()
df %>%
group_by(!!!group_by) %>%
summarise(a = mean(a))
}
> df <- tibble(
g1 = c(1, 1, 2, 2, 2),
g2 = c(1, 2, 1, 2, 1),
a = sample(5),
b = sample(5)
)
> my_summarise(df, g1, g2)
[[1]]
<quosure: global>
~g1
[[2]]
<quosure: global>
~g2
attr(,"class")
[1] "quosures"
# A tibble: 4 x 3
# Groups: g1 [?]
g1 g2 a
<dbl> <dbl> <dbl>
1 1 1 1.0
2 1 2 5.0
3 2 1 2.5
4 2 2 4.0
I'd like to explicitly supply the variables I wish to group by as a parameter to my argument but does it work if I specify them as ...
but I decided to test if my function works when supplying the grouping variables as ...
table_summary <- function(df = .,
id = individual_id,
select = c(),
group = c(),
digits = 3,
...){
## Quote all arguments (see http://dplyr.tidyverse.org/articles/programming.html)
quo_id <- enquo(id)
quo_select <- enquo(select)
## quo_group <- quos(group)
quo_group <- quos(...)
UQS(quo_group) %>% print()
## Subset the data
df <- df %>%
dplyr::select(!!quo_id, !!quo_select, !!!quo_group) %>%
unique()
## gather() data, just in case there is > 1 variable selected to be summarised
df <- df %>%
gather(key = variable, value = value, !!quo_select)
## Summarise selected variables by specified groups
results <- df %>%
group_by(!!!quo_group, variable) %>%
summarise(n = n(),
mean = mean(value, na.rm = TRUE))
return(results)
}
...but it doesn't, quos()
again unquote-splices to NULL
so the variables are neither selected nor grouped by...
> table_summary(df = mtcars, id = model, select = c(mpg), gear, hp)
NULL
# A tibble: 1 x 3
variable n mean
<chr> <int> <dbl>
1 mpg 32 20.09062
> table_summary(df = mtcars, id = model, select = c(mpg), gear)
NULL
# A tibble: 1 x 3
variable n mean
<chr> <int> <dbl>
1 mpg 32 20.09062
I've gone through this cycle several times now checking each method of using enquo()
and quos()
but can not see where I am going wrong and despite having read the programming dplyr documentation several times.
Similarly to readr , dplyr and tidyr are also part of the tidyverse. These packages were loaded in R's memory when we called library(tidyverse) earlier.
Tidy evaluation is a framework for controlling how expressions and variables in your code are evaluated by tidyverse functions. This framework, housed in the rlang package, is a powerful tool for writing more efficient and elegant code.
enquo() takes a symbol referring to a function argument, quotes the R code that was supplied to this argument, captures the environment where the function was called (and thus where the R code was typed), and bundles them in a quosure. quos() is a bit different to other functions as it returns a list of quosures.
These five functions provide the basis of a language of data manipulation.
IIUC your post, you want to supply c(col1, col2)
to group_by()
. This is not supported by that verb:
group_by(mtcars, c(cyl, am))
#> Error in mutate_impl(.data, dots) :
#> Column `c(cyl, am)` must be length 32 (the number of rows) or one, not 64
That's because group_by()
has mutate semantics, not select semantics. That means that the expressions you supply to group_by()
are transformative expressions. This is a surprising but quite handy feature. For example you can group by disp
cut into three intervals like this:
group_by(mtcars, cut3 = cut(disp, 3))
This also means that if you supply c(cyl, am)
, it will concatenate the two columns together and return a vector of length 64, while it was expecting a length of 32 (the number of rows).
So your problem is that you want a wrapper to group_by()
that has selection semantics. This is easy to do by using dplyr::select_vars()
, which will soon be extracted to the new tidyselect package:
library("dplyr")
group_wrapper <- function(df, groups = rlang::chr()) {
groups <- select_vars(tbl_vars(df), !! enquo(groups))
group_by(df, !!! rlang::syms(groups))
}
Alternatively you can wrap the new group_by_at()
verb which does have select semantics:
group_wrapper <- function(df, groups = rlang::chr()) {
group_by_at(df, vars(!! enquo(groups)))
}
Let's try it out:
group_wrapper(mtcars, c(disp, am))
#> # A tibble: 32 x 11
#> # Groups: disp, am [27]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21.0 6 160 110 3.90 2.62 16.5 0 1 4 4
#> # ... with 22 more rows
This interface has the advantage of supporting all select()
operations to select the columns to group by.
Note that I'm using rlang::chr()
as default argument because c()
returns NULL
which isn't supported by selecting functions (we may want to change that in the future). chr()
called without arguments returns a character vector of length 0.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With