Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Programming with dplyr using string as input

Tags:

r

dplyr

I would like to write a function that uses dplyr inside and I supply variable names as strings. Unfortunately dplyr-s use of NSE makes it rather complicated. From Programming with dplyr I get the following example

my_summarise <- function(df, var) {
  var <- enquo(var)

  df %>%
    group_by(!!var) %>%
    summarise(a = mean(a))
}

my_summarise(df, g1)

However, I would like to write function where instead of g1 I could provide "g1" and I am not able to wrap my head around how to do that.

like image 860
Raivo Kolde Avatar asked May 22 '17 20:05

Raivo Kolde


People also ask

How do I convert a string to a variable name in R?

We can assign character string to variable name by using assign() function. We simply have to pass the name of the variable and the value to the function. Parameter: variable_name is the name of the value.

How do I use dplyr in R?

Describe what the dplyr package in R is used for. Apply common dplyr functions to manipulate data in R. Employ the 'pipe' operator to link together a sequence of functions. Employ the 'mutate' function to apply other chosen functions to existing columns and create new columns of data.

How many functions are there in dplyr?

These five functions provide the basis of a language of data manipulation.

Why do we use dplyr in R?

The dplyr package in R Programming Language is a structure of data manipulation that provides a uniform set of verbs, helping to resolve the most frequent data manipulation hurdles.


2 Answers

dplyr >= 1.0

Use combination of double braces and the across function:

my_summarise2 <- function(df, group_var) {
  df %>% group_by(across({{ group_var }})) %>% 
    summarise(mpg = mean(mpg))
}

my_summarise2(mtcars, "cyl")

# A tibble: 3 x 2
#    cyl   mpg
#  <dbl> <dbl>
# 1     4  26.7
# 2     6  19.7
# 3     8  15.1

# same result as above, passing cyl without quotes
my_summarise(mtcars, cyl)

dplyr < 1.0

As far as I know, you could use as.name or sym (from the rlang package - I don't know if dplyr will import it eventually):

library(dplyr)
my_summarise <- function(df, var) {
  var <- rlang::sym(var)
  df %>%
    group_by(!!var) %>%
    summarise(mpg = mean(mpg))
}

or

my_summarise <- function(df, var) {
  var <- as.name(var)
  df %>%
    group_by(!!var) %>%
    summarise(mpg = mean(mpg))
}

my_summarise(mtcars, "cyl")
# # A tibble: 3 × 2
#     cyl      mpg
#   <dbl>    <dbl>
# 1     4 26.66364
# 2     6 19.74286
# 3     8 15.10000
like image 92
lukeA Avatar answered Oct 21 '22 17:10

lukeA


Using the .data pronoun from rlang is another option that works directly with column names stored as strings.

The function with .data would look like

my_summarise <- function(df, var) {
     df %>%
          group_by(.data[[var]]) %>%
          summarise(mpg = mean(mpg))
}

my_summarise(mtcars, "cyl")
# A tibble: 3 x 2
    cyl   mpg
  <dbl> <dbl>
1     4  26.7
2     6  19.7
3     8  15.1
like image 30
aosmith Avatar answered Oct 21 '22 17:10

aosmith