I would like to write a function that uses dplyr inside and I supply variable names as strings. Unfortunately dplyr-s use of NSE makes it rather complicated. From Programming with dplyr I get the following example <pre class="prettyprint"><code>my_summarise <- function(df, var) { var <- enquo(var) df %>% group_by(!!var) %>% summarise(a = mean(a)) } my_summarise(df, g1) </code></pre> However, I would like to write function where instead of <code>g1</code> I could provide <code>"g1"</code> and I am not able to wrap my head around how to do that.

dplyr >= 1.0 Use combination of double braces and the across function: <pre class="prettyprint"><code>my_summarise2 <- function(df, group_var) { df %>% group_by(across({{ group_var }})) %>% summarise(mpg = mean(mpg)) } my_summarise2(mtcars, "cyl") # A tibble: 3 x 2 # cyl mpg # <dbl> <dbl> # 1 4 26.7 # 2 6 19.7 # 3 8 15.1 # same result as above, passing cyl without quotes my_summarise(mtcars, cyl) </code></pre> dplyr < 1.0 As far as I know, you could use <code>as.name</code> or <code>sym</code> (from the <code>rlang</code> package - I don't know if <code>dplyr</code> will import it eventually): <pre class="prettyprint"><code>library(dplyr) my_summarise <- function(df, var) { var <- rlang::sym(var) df %>% group_by(!!var) %>% summarise(mpg = mean(mpg)) } </code></pre> or <pre class="prettyprint"><code>my_summarise <- function(df, var) { var <- as.name(var) df %>% group_by(!!var) %>% summarise(mpg = mean(mpg)) } my_summarise(mtcars, "cyl") # # A tibble: 3 × 2 # cyl mpg # <dbl> <dbl> # 1 4 26.66364 # 2 6 19.74286 # 3 8 15.10000 </code></pre>

Using the <code>.data</code> pronoun from rlang is another option that works directly with column names stored as strings. The function with <code>.data</code> would look like <pre class="prettyprint"><code>my_summarise <- function(df, var) { df %>% group_by(.data[[var]]) %>% summarise(mpg = mean(mpg)) } my_summarise(mtcars, "cyl") # A tibble: 3 x 2 cyl mpg <dbl> <dbl> 1 4 26.7 2 6 19.7 3 8 15.1 </code></pre>

Programming with dplyr using string as input

I would like to write a function that uses dplyr inside and I supply variable names as strings. Unfortunately dplyr-s use of NSE makes it rather complicated. From Programming with dplyr I get the following example

my_summarise <- function(df, var) {
  var <- enquo(var)

  df %>%
    group_by(!!var) %>%
    summarise(a = mean(a))
}

my_summarise(df, g1)

However, I would like to write function where instead of g1 I could provide "g1" and I am not able to wrap my head around how to do that.

How do I convert a string to a variable name in R?

We can assign character string to variable name by using assign() function. We simply have to pass the name of the variable and the value to the function. Parameter: variable_name is the name of the value.

How do I use dplyr in R?

Describe what the dplyr package in R is used for. Apply common dplyr functions to manipulate data in R. Employ the 'pipe' operator to link together a sequence of functions. Employ the 'mutate' function to apply other chosen functions to existing columns and create new columns of data.

How many functions are there in dplyr?

These five functions provide the basis of a language of data manipulation.

Why do we use dplyr in R?

The dplyr package in R Programming Language is a structure of data manipulation that provides a uniform set of verbs, helping to resolve the most frequent data manipulation hurdles.

dplyr >= 1.0

Use combination of double braces and the across function:

my_summarise2 <- function(df, group_var) {
  df %>% group_by(across({{ group_var }})) %>% 
    summarise(mpg = mean(mpg))
}

my_summarise2(mtcars, "cyl")

# A tibble: 3 x 2
#    cyl   mpg
#  <dbl> <dbl>
# 1     4  26.7
# 2     6  19.7
# 3     8  15.1

# same result as above, passing cyl without quotes
my_summarise(mtcars, cyl)

dplyr < 1.0

As far as I know, you could use as.name or sym (from the rlang package - I don't know if dplyr will import it eventually):

library(dplyr)
my_summarise <- function(df, var) {
  var <- rlang::sym(var)
  df %>%
    group_by(!!var) %>%
    summarise(mpg = mean(mpg))
}

or

my_summarise <- function(df, var) {
  var <- as.name(var)
  df %>%
    group_by(!!var) %>%
    summarise(mpg = mean(mpg))
}

my_summarise(mtcars, "cyl")
# # A tibble: 3 × 2
#     cyl      mpg
#   <dbl>    <dbl>
# 1     4 26.66364
# 2     6 19.74286
# 3     8 15.10000

Using the .data pronoun from rlang is another option that works directly with column names stored as strings.

The function with .data would look like

my_summarise <- function(df, var) {
     df %>%
          group_by(.data[[var]]) %>%
          summarise(mpg = mean(mpg))
}

my_summarise(mtcars, "cyl")
# A tibble: 3 x 2
    cyl   mpg
  <dbl> <dbl>
1     4  26.7
2     6  19.7
3     8  15.1

Programming with dplyr using string as input

Tags:

r

dplyr

Raivo Kolde

People also ask

2 Answers

lukeA

aosmith

Recent Activity

Donate For Us

Programming with dplyr using string as input

Tags:

r

dplyr

Raivo Kolde

People also ask

2 Answers

lukeA

aosmith

Related questions

Recent Activity

Donate For Us