In the current version of dplyr, select
arguments can be passed by value:
variable <- "Species"
iris %>%
select(variable)
# Species
#1 setosa
#2 setosa
#3 setosa
#4 setosa
#5 setosa
#6 setosa
#...
But group_by
arguments cannot be passed by value:
iris %>%
group_by(variable) %>%
summarise(Petal.Length = mean(Petal.Length))
# Error in grouped_df_impl(data, unname(vars), drop) :
# Column `variable` is unknown
The documented dplyr::select behaviour is
iris %>% select(Species)
And the documented documented dplyr::group_by behaviour is
iris %>%
group_by(Species) %>%
summarise(Petal.Length = mean(Petal.Length))
select
and group_by
different with respect to passing arguments by value?select
call working and will it continue to work in the future?group_by
call not working? I'm trying to figure out what combination of quo()
, enquo()
and !!
I should use to make it work.I need this because I would like to create a function that takes a grouping variable as input parameter, if possible the grouping variable should be given as a character string, because two other function parameters are already given as character strings.
Groupby Function in R – group_by is used to group the dataframe in R. Dplyr package in R is provided with group_by() function which groups the dataframe by multiple columns with mean, sum and other functions like count, maximum and minimum.
The group_by() method is used to group the data contained in the data frame based on the columns specified as arguments to the function call.
Most data operations are done on groups defined by variables. group_by() takes an existing tbl and converts it into a grouped tbl where operations are performed "by group". ungroup() removes grouping.
Group_by() function belongs to the dplyr package in the R programming language, which groups the data frames. Group_by() function alone will not give any output.
To pass string as symbol or unevaluated code, you have to first parse it to symbol or quosure. You can use sym
or parse_expr
from rlang
to parse and later use !!
to unquote:
library(dplyr)
variable <- rlang::sym("Species")
# variable <- rlang::parse_expr("Species")
iris %>%
group_by(!! variable) %>%
summarise(Petal.Length = mean(Petal.Length))
!!
is a shortcut for UQ()
, which unquotes the expression or symbol. This allows variable
to be evaluated only within the scope of where it is called, namely, group_by
.
Difference between sym
and parse_expr
and which one to use when?
The short answer: it doesn't matter in this case.
The long answer:
A symbol is a way to refer to an R object, basically the "name" of an object. So sym
is similar to as.name
in base R. parse_expr
on the other hand transforms some text into R expressions. This is similar to parse
in base R.
Expressions can be any R code, not just code that references R objects. So you can parse the code that references an R object, but you can't turn some random code into sym
if the object that it references does not exist.
In general, you will use sym
when your string refers to an object (although parse_expr
would also work), and use parse_expr
when you are trying to parse any other R code for further evaluation.
For this particular use case, variable
is supposed to be referencing an object, so turning it into a sym
would work. On the other hand, parsing it as an expression would also work because that is the code that is going to be evaluated inside group_by
when being unquoted by !!
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With