standard evaluation in dplyr: summarise a variable given as a character string

Question

UPDATE July 2020:

dplyr 1.0 has changed pretty much everything about this question as well as all of the answers. See the dplyr programming vignette here:

https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html

The new way to refer to columns when their identifier is stored as a character vector is to use the .data pronoun from rlang, and then subset as you would in base R.

library(dplyr)  key <- "v3" val <- "v2" drp <- "v1"  df <- tibble(v1 = 1:5, v2 = 6:10, v3 = c(rep("A", 3), rep("B", 2)))  df %>%      select(-matches(drp)) %>%      group_by(.data[[key]]) %>%      summarise(total = sum(.data[[val]], na.rm = TRUE))  #> `summarise()` ungrouping output (override with `.groups` argument) #> # A tibble: 2 x 2 #>   v3    total #>   <chr> <int> #> 1 A        21 #> 2 B        19

If your code is in a package function, you can @importFrom rlang .data to avoid R check notes about undefined globals.

ORIGINAL QUESTION:

I want to refer to an unknown column name inside a summarise. The standard evaluation functions introduced in dplyr 0.3 allow column names to be referenced using variables, but this doesn't appear to work when you call a base R function within e.g. a summarise.

library(dplyr)   key <- "v3" val <- "v2" drp <- "v1"   df <- data_frame(v1 = 1:5, v2 = 6:10, v3 = c(rep("A", 3), rep("B", 2)))

The df looks like this:

> df Source: local data frame [5 x 3]    v1 v2 v3 1  1  6  A 2  2  7  A 3  3  8  A 4  4  9  B 5  5 10  B

I want to drop v1, group by v3, and sum v2 for each group:

df %>% select(-matches(drp)) %>% group_by_(key) %>% summarise_(sum(val, na.rm = TRUE))  Error in sum(val, na.rm = TRUE) : invalid 'type' (character) of argument

The NSE version of select() works fine, since it can match a character string. The SE version of group_by() works fine, since it can now accept variables as arguments and evaluate them. However, I haven't found a way to achieve similar results when using base R functions inside dplyr functions.

Things that don't work:

df %>% group_by_(key) %>% summarise_(sum(get(val), na.rm = TRUE)) Error in get(val) : object 'v2' not found  df %>% group_by_(key) %>% summarise_(sum(eval(as.symbol(val)), na.rm = TRUE)) Error in eval(expr, envir, enclos) : object 'v2' not found

I've checked out several related questions, but none of the proposed solutions have worked for me so far.

Henrik · Accepted Answer

Please note that this answer does not apply to dplyr >= 0.7.0, but to previous versions.

[dplyr 0.7.0] has a new approach to non-standard evaluation (NSE) called tidyeval. It is described in detail in vignette("programming").

The dplyr vignette on non-standard evalutation is helpful here. Check the section "Mixing constants and variables" and you find that the function interp from package lazyeval could be used, and "[u]se as.name if you have a character string that gives a variable name":

library(lazyeval) df %>%   select(-matches(drp)) %>%   group_by_(key) %>%   summarise_(sum_val = interp(~sum(var, na.rm = TRUE), var = as.name(val))) #   v3 sum_val # 1  A      21 # 2  B      19

standard evaluation in dplyr: summarise a variable given as a character string

Tags:

r

dplyr

Ajar

1 Answers

Henrik

Recent Activity

Donate For Us

standard evaluation in dplyr: summarise a variable given as a character string

Tags:

r

dplyr

Ajar

1 Answers

Henrik

Related questions

Recent Activity

Donate For Us