UPDATE July 2020:
dplyr
1.0 has changed pretty much everything about this question as well as all of the answers. See the dplyr
programming vignette here:
https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html
The new way to refer to columns when their identifier is stored as a character vector is to use the .data
pronoun from rlang
, and then subset as you would in base R.
library(dplyr) key <- "v3" val <- "v2" drp <- "v1" df <- tibble(v1 = 1:5, v2 = 6:10, v3 = c(rep("A", 3), rep("B", 2))) df %>% select(-matches(drp)) %>% group_by(.data[[key]]) %>% summarise(total = sum(.data[[val]], na.rm = TRUE)) #> `summarise()` ungrouping output (override with `.groups` argument) #> # A tibble: 2 x 2 #> v3 total #> <chr> <int> #> 1 A 21 #> 2 B 19
If your code is in a package function, you can @importFrom rlang .data
to avoid R check notes about undefined globals.
ORIGINAL QUESTION:
I want to refer to an unknown column name inside a summarise
. The standard evaluation functions introduced in dplyr 0.3
allow column names to be referenced using variables, but this doesn't appear to work when you call a base
R function within e.g. a summarise
.
library(dplyr) key <- "v3" val <- "v2" drp <- "v1" df <- data_frame(v1 = 1:5, v2 = 6:10, v3 = c(rep("A", 3), rep("B", 2)))
The df looks like this:
> df Source: local data frame [5 x 3] v1 v2 v3 1 1 6 A 2 2 7 A 3 3 8 A 4 4 9 B 5 5 10 B
I want to drop v1, group by v3, and sum v2 for each group:
df %>% select(-matches(drp)) %>% group_by_(key) %>% summarise_(sum(val, na.rm = TRUE)) Error in sum(val, na.rm = TRUE) : invalid 'type' (character) of argument
The NSE version of select()
works fine, since it can match a character string. The SE version of group_by()
works fine, since it can now accept variables as arguments and evaluate them. However, I haven't found a way to achieve similar results when using base R functions inside dplyr
functions.
Things that don't work:
df %>% group_by_(key) %>% summarise_(sum(get(val), na.rm = TRUE)) Error in get(val) : object 'v2' not found df %>% group_by_(key) %>% summarise_(sum(eval(as.symbol(val)), na.rm = TRUE)) Error in eval(expr, envir, enclos) : object 'v2' not found
I've checked out several related questions, but none of the proposed solutions have worked for me so far.
Please note that this answer does not apply to dplyr >= 0.7.0
, but to previous versions.
[
dplyr 0.7.0
] has a new approach to non-standard evaluation (NSE) called tidyeval. It is described in detail invignette("programming")
.
The dplyr
vignette on non-standard evalutation is helpful here. Check the section "Mixing constants and variables" and you find that the function interp
from package lazyeval
could be used, and "[u]se as.name
if you have a character string that gives a variable name":
library(lazyeval) df %>% select(-matches(drp)) %>% group_by_(key) %>% summarise_(sum_val = interp(~sum(var, na.rm = TRUE), var = as.name(val))) # v3 sum_val # 1 A 21 # 2 B 19
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With