dplyr: How to use group_by inside a function?

I want to use use the dplyr::group_by function inside another function, but I do not know how to pass the arguments to this function.

Can someone provide a working example?

library(dplyr) data(iris) iris %.% group_by(Species) %.% summarise(n = n()) #  ## Source: local data frame [3 x 2] ##      Species  n ## 1  virginica 50 ## 2 versicolor 50 ## 3     setosa 50  mytable0 <- function(x, ...) x %.% group_by(...) %.% summarise(n = n()) mytable0(iris, "Species") # OK ## Source: local data frame [3 x 2] ##      Species  n ## 1  virginica 50 ## 2 versicolor 50 ## 3     setosa 50  mytable1 <- function(x, key) x %.% group_by(as.name(key)) %.% summarise(n = n()) mytable1(iris, "Species") # Wrong! # Error: unsupported type for column 'as.name(key)' (SYMSXP)  mytable2 <- function(x, key) x %.% group_by(key) %.% summarise(n = n()) mytable2(iris, "Species") # Wrong! # Error: index out of bounds 
2 Answers

For programming, group_by_ is the counterpart to group_by:

library(dplyr)  mytable <- function(x, ...) x %>% group_by_(...) %>% summarise(n = n()) mytable(iris, "Species") # or iris %>% mytable("Species") 

which gives:

     Species  n 1     setosa 50 2 versicolor 50 3  virginica 50 

Update At the time this was written dplyr used %.% which is what was originally used above but now %>% is favored so have changed above to that to keep this relevant.

Update 2 regroup is now deprecated, use group_by_ instead.

Update 3 group_by_(list(...)) now becomes group_by_(...) in new version of dplyr as per Roberto's comment.

Update 4 Added minor variation suggested in comments.

Update 5: With rlang/tidyeval it is now possible to do this:

library(rlang) mytable <- function(x, ...) {   group_ <- syms(...)   x %>%      group_by(!!!group_) %>%      summarise(n = n()) } mytable(iris, "Species") 

or passing Species unevaluated, i.e. no quotes around it:

library(rlang) mytable <- function(x, ...) {   group_ <- enquos(...)   x %>%      group_by(!!!group_) %>%      summarise(n = n()) } mytable(iris, Species) 

Update 6: There is now a {{...}} notation that works if there is just one grouping variable:

mytable <- function(x, group) {   x %>%      group_by({{group}}) %>%      summarise(n = n()) } mytable(iris, Species) 
UPDATE: As of dplyr 0.7.0 you can use tidy eval to accomplish this.

See http://dplyr.tidyverse.org/articles/programming.html for more details.

library(tidyverse) data("iris")  my_table <- function(df, group_var) {   group_var <- enquo(group_var)      # Create quosure   df %>%      group_by(!!group_var) %>%        # Use !! to unquote the quosure     summarise(n = n()) }  my_table(iris, Species)  > my_table(iris, Species) # A tibble: 3 x 2      Species     n       <fctr> <int> 1     setosa    50 2 versicolor    50 3  virginica    50 
