Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr: How to use group_by inside a function?

I want to use use the dplyr::group_by function inside another function, but I do not know how to pass the arguments to this function.

Can someone provide a working example?

library(dplyr) data(iris) iris %.% group_by(Species) %.% summarise(n = n()) #  ## Source: local data frame [3 x 2] ##      Species  n ## 1  virginica 50 ## 2 versicolor 50 ## 3     setosa 50  mytable0 <- function(x, ...) x %.% group_by(...) %.% summarise(n = n()) mytable0(iris, "Species") # OK ## Source: local data frame [3 x 2] ##      Species  n ## 1  virginica 50 ## 2 versicolor 50 ## 3     setosa 50  mytable1 <- function(x, key) x %.% group_by(as.name(key)) %.% summarise(n = n()) mytable1(iris, "Species") # Wrong! # Error: unsupported type for column 'as.name(key)' (SYMSXP)  mytable2 <- function(x, key) x %.% group_by(key) %.% summarise(n = n()) mytable2(iris, "Species") # Wrong! # Error: index out of bounds 
like image 678
Emilio Torres Manzanera Avatar asked Feb 16 '14 17:02

Emilio Torres Manzanera


People also ask

Can you use dplyr in a function?

dplyr functions use non-standard evaluation. That is why you do not have to quote your variable names when you do something like select(mtcars, mpg) , and why select(mtcars, "mpg") doesn't work. When you use dplyr in functions, you will likely want to use "standard evaluation".

What function is group_by in R?

Group_by() function belongs to the dplyr package in the R programming language, which groups the data frames. Group_by() function alone will not give any output. It should be followed by summarise() function with an appropriate action to perform. It works similar to GROUP BY in SQL and pivot table in excel.

What does the group_by function do?

group_by() takes an existing tbl and converts it into a grouped tbl where operations are performed "by group". ungroup() removes grouping.

What does %>% do in dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).


2 Answers

For programming, group_by_ is the counterpart to group_by:

library(dplyr)  mytable <- function(x, ...) x %>% group_by_(...) %>% summarise(n = n()) mytable(iris, "Species") # or iris %>% mytable("Species") 

which gives:

     Species  n 1     setosa 50 2 versicolor 50 3  virginica 50 

Update At the time this was written dplyr used %.% which is what was originally used above but now %>% is favored so have changed above to that to keep this relevant.

Update 2 regroup is now deprecated, use group_by_ instead.

Update 3 group_by_(list(...)) now becomes group_by_(...) in new version of dplyr as per Roberto's comment.

Update 4 Added minor variation suggested in comments.

Update 5: With rlang/tidyeval it is now possible to do this:

library(rlang) mytable <- function(x, ...) {   group_ <- syms(...)   x %>%      group_by(!!!group_) %>%      summarise(n = n()) } mytable(iris, "Species") 

or passing Species unevaluated, i.e. no quotes around it:

library(rlang) mytable <- function(x, ...) {   group_ <- enquos(...)   x %>%      group_by(!!!group_) %>%      summarise(n = n()) } mytable(iris, Species) 

Update 6: There is now a {{...}} notation that works if there is just one grouping variable:

mytable <- function(x, group) {   x %>%      group_by({{group}}) %>%      summarise(n = n()) } mytable(iris, Species) 
like image 130
G. Grothendieck Avatar answered Oct 03 '22 05:10

G. Grothendieck


UPDATE: As of dplyr 0.7.0 you can use tidy eval to accomplish this.

See http://dplyr.tidyverse.org/articles/programming.html for more details.

library(tidyverse) data("iris")  my_table <- function(df, group_var) {   group_var <- enquo(group_var)      # Create quosure   df %>%      group_by(!!group_var) %>%        # Use !! to unquote the quosure     summarise(n = n()) }  my_table(iris, Species)  > my_table(iris, Species) # A tibble: 3 x 2      Species     n       <fctr> <int> 1     setosa    50 2 versicolor    50 3  virginica    50 
like image 41
Brad Cannell Avatar answered Oct 03 '22 05:10

Brad Cannell