I want to filter a dataframe using a field which is defined in a variable, to select a value that is also in a variable. Say I have
df <- data.frame(V=c(6, 1, 5, 3, 2), Unhappy=c("N", "Y", "Y", "Y", "N"))
fld <- "Unhappy"
sval <- "Y"
The value I want would be df[df$Unhappy == "Y", ]
.
I've read the nse
vignette to try use filter_
but can't quite understand it. I tried
df %>% filter_(.dots = ~ fld == sval)
which returned nothing. I got what I wanted with
df %>% filter_(.dots = ~ Unhappy == sval)
but obviously that defeats the purpose of having a variable to store the field name. Any clues please? Eventually I want to use this where fld
is a vector of field names and sval
is a vector of filter values for each field in fld
.
6.4 dplyr basics filter() : pick observations by their values. select() : pick variables by their names. mutate() : create new variables with functions of existing variables. summarise() : collapse many values down to a single summary.
All of the dplyr functions take a data frame (or tibble) as the first argument. Rather than forcing the user to either save intermediate objects or nest functions, dplyr provides the %>% operator from magrittr.
filter( ) Function It is used to subset data with matching logical conditions. Suppose you need to subset data. You want to filter rows and retain only those values in which Index is equal to A. The %in% operator can be used to select multiple items.
You can try with interp
from lazyeval
library(lazyeval)
library(dplyr)
df %>%
filter_(interp(~v==sval, v=as.name(fld)))
# V Unhappy
#1 1 Y
#2 5 Y
#3 3 Y
For multiple key/value pairs, I found this to be working but I think a better way should be there.
df1 %>%
filter_(interp(~v==sval1[1] & y ==sval1[2],
.values=list(v=as.name(fld1[1]), y= as.name(fld1[2]))))
# V Unhappy Col2
#1 1 Y B
#2 5 Y B
For these cases, I find the base R
option to be easier. For example, if we are trying to filter
the rows based on the 'key' variables in 'fld1' with corresponding values in 'sval1', one option is using Map
. We subset the dataset (df1[fld1]
) and apply the FUN (==
) to each column of df1[f1d1]
with corresponding value in 'sval1' and use the &
with Reduce
to get a logical vector that can be used to filter
the rows of 'df1'.
df1[Reduce(`&`, Map(`==`, df1[fld1],sval1)),]
# V Unhappy Col2
# 2 1 Y B
#3 5 Y B
df1 <- cbind(df, Col2= c("A", "B", "B", "C", "A"))
fld1 <- c(fld, 'Col2')
sval1 <- c(sval, 'B')
Here's an alternative with base R
, which is maybe not very elegant, but it might have the benefit of being rather easily understandable:
df[df[colnames(df)==fld]==sval,]
# V Unhappy
#2 1 Y
#3 5 Y
#4 3 Y
Now, with rlang
0.4.0, it introduces a new more intuitive way for this type of use case:
packageVersion("rlang")
# [1] ‘0.4.0’
df <- data.frame(V=c(6, 1, 5, 3, 2), Unhappy=c("N", "Y", "Y", "Y", "N"))
fld <- "Unhappy"
sval <- "Y"
df %>% filter(.data[[fld]]==sval)
#OR
filter_col_val <- function(df, fld, sval) {
df %>% filter({{fld}}==sval)
}
filter_col_val(df, Unhappy, "Y")
More information can be found at https://www.tidyverse.org/articles/2019/06/rlang-0-4-0/
Previous Answer
With dplyr 0.6.0 and later, this code works:
packageVersion("dplyr")
# [1] ‘0.7.1’
df <- data.frame(V=c(6, 1, 5, 3, 2), Unhappy=c("N", "Y", "Y", "Y", "N"))
fld <- "Unhappy"
sval <- "Y"
df %>% filter(UQ(rlang::sym(fld))==sval)
#OR
df %>% filter((!!rlang::sym(fld))==sval)
#OR
fld <- quo(Unhappy)
sval <- "Y"
df %>% filter(UQ(fld)==sval)
More about the dplyr
syntax available at http://dplyr.tidyverse.org/articles/programming.html and the quosure usage in the rlang
package https://cran.r-project.org/web/packages/rlang/index.html .
If you find it challenging mastering non-standard evaluation in dplyr 0.6+, Alex Hayes has an excellent writing-up on the topic: https://www.alexpghayes.com/blog/gentle-tidy-eval-with-examples/
Original Answer
With dplyr version 0.5.0 and later, it is possible to use a simpler syntax and gets closer to the syntax @Ricky originally wanted, which I also find more readable than using lazyeval::interp
df %>% filter_(.dots = paste0(fld, "=='", sval, "'"))
# V Unhappy
#1 1 Y
#2 5 Y
#3 3 Y
#OR
df %>% filter_(.dots = glue::glue("{fld}=='{sval}'"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With