I want to use use the dplyr::group_by
function inside another function, but I do not know how to pass the arguments to this function.
Can someone provide a working example?
library(dplyr) data(iris) iris %.% group_by(Species) %.% summarise(n = n()) # ## Source: local data frame [3 x 2] ## Species n ## 1 virginica 50 ## 2 versicolor 50 ## 3 setosa 50 mytable0 <- function(x, ...) x %.% group_by(...) %.% summarise(n = n()) mytable0(iris, "Species") # OK ## Source: local data frame [3 x 2] ## Species n ## 1 virginica 50 ## 2 versicolor 50 ## 3 setosa 50 mytable1 <- function(x, key) x %.% group_by(as.name(key)) %.% summarise(n = n()) mytable1(iris, "Species") # Wrong! # Error: unsupported type for column 'as.name(key)' (SYMSXP) mytable2 <- function(x, key) x %.% group_by(key) %.% summarise(n = n()) mytable2(iris, "Species") # Wrong! # Error: index out of bounds
dplyr functions use non-standard evaluation. That is why you do not have to quote your variable names when you do something like select(mtcars, mpg) , and why select(mtcars, "mpg") doesn't work. When you use dplyr in functions, you will likely want to use "standard evaluation".
Group_by() function belongs to the dplyr package in the R programming language, which groups the data frames. Group_by() function alone will not give any output. It should be followed by summarise() function with an appropriate action to perform. It works similar to GROUP BY in SQL and pivot table in excel.
group_by() takes an existing tbl and converts it into a grouped tbl where operations are performed "by group". ungroup() removes grouping.
%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).
For programming, group_by_
is the counterpart to group_by
:
library(dplyr) mytable <- function(x, ...) x %>% group_by_(...) %>% summarise(n = n()) mytable(iris, "Species") # or iris %>% mytable("Species")
which gives:
Species n 1 setosa 50 2 versicolor 50 3 virginica 50
Update At the time this was written dplyr used %.%
which is what was originally used above but now %>%
is favored so have changed above to that to keep this relevant.
Update 2 regroup is now deprecated, use group_by_ instead.
Update 3 group_by_(list(...))
now becomes group_by_(...)
in new version of dplyr as per Roberto's comment.
Update 4 Added minor variation suggested in comments.
Update 5: With rlang/tidyeval it is now possible to do this:
library(rlang) mytable <- function(x, ...) { group_ <- syms(...) x %>% group_by(!!!group_) %>% summarise(n = n()) } mytable(iris, "Species")
or passing Species
unevaluated, i.e. no quotes around it:
library(rlang) mytable <- function(x, ...) { group_ <- enquos(...) x %>% group_by(!!!group_) %>% summarise(n = n()) } mytable(iris, Species)
Update 6: There is now a {{...}} notation that works if there is just one grouping variable:
mytable <- function(x, group) { x %>% group_by({{group}}) %>% summarise(n = n()) } mytable(iris, Species)
UPDATE: As of dplyr 0.7.0 you can use tidy eval to accomplish this.
See http://dplyr.tidyverse.org/articles/programming.html for more details.
library(tidyverse) data("iris") my_table <- function(df, group_var) { group_var <- enquo(group_var) # Create quosure df %>% group_by(!!group_var) %>% # Use !! to unquote the quosure summarise(n = n()) } my_table(iris, Species) > my_table(iris, Species) # A tibble: 3 x 2 Species n <fctr> <int> 1 setosa 50 2 versicolor 50 3 virginica 50
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With