I would like to write a function that uses dplyr inside and I supply variable names as strings. Unfortunately dplyr-s use of NSE makes it rather complicated. From Programming with dplyr I get the following example
my_summarise <- function(df, var) {
var <- enquo(var)
df %>%
group_by(!!var) %>%
summarise(a = mean(a))
}
my_summarise(df, g1)
However, I would like to write function where instead of g1
I could provide "g1"
and I am not able to wrap my head around how to do that.
We can assign character string to variable name by using assign() function. We simply have to pass the name of the variable and the value to the function. Parameter: variable_name is the name of the value.
Describe what the dplyr package in R is used for. Apply common dplyr functions to manipulate data in R. Employ the 'pipe' operator to link together a sequence of functions. Employ the 'mutate' function to apply other chosen functions to existing columns and create new columns of data.
These five functions provide the basis of a language of data manipulation.
The dplyr package in R Programming Language is a structure of data manipulation that provides a uniform set of verbs, helping to resolve the most frequent data manipulation hurdles.
dplyr >= 1.0
Use combination of double braces and the across function:
my_summarise2 <- function(df, group_var) {
df %>% group_by(across({{ group_var }})) %>%
summarise(mpg = mean(mpg))
}
my_summarise2(mtcars, "cyl")
# A tibble: 3 x 2
# cyl mpg
# <dbl> <dbl>
# 1 4 26.7
# 2 6 19.7
# 3 8 15.1
# same result as above, passing cyl without quotes
my_summarise(mtcars, cyl)
dplyr < 1.0
As far as I know, you could use as.name
or sym
(from the rlang
package - I don't know if dplyr
will import it eventually):
library(dplyr)
my_summarise <- function(df, var) {
var <- rlang::sym(var)
df %>%
group_by(!!var) %>%
summarise(mpg = mean(mpg))
}
or
my_summarise <- function(df, var) {
var <- as.name(var)
df %>%
group_by(!!var) %>%
summarise(mpg = mean(mpg))
}
my_summarise(mtcars, "cyl")
# # A tibble: 3 × 2
# cyl mpg
# <dbl> <dbl>
# 1 4 26.66364
# 2 6 19.74286
# 3 8 15.10000
Using the .data
pronoun from rlang is another option that works directly with column names stored as strings.
The function with .data
would look like
my_summarise <- function(df, var) {
df %>%
group_by(.data[[var]]) %>%
summarise(mpg = mean(mpg))
}
my_summarise(mtcars, "cyl")
# A tibble: 3 x 2
cyl mpg
<dbl> <dbl>
1 4 26.7
2 6 19.7
3 8 15.1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With