I'd like to define a helper function to help me compose some boolean filters more clearly.
This is a working example of the result using the iris
dataset
library(tidyverse)
sepal_config = function(length, width, species, .data) {
.data$Sepal.Length > length & .data$Sepal.Width < width & .data$Species == species
}
iris %>%
filter(
sepal_config(length = 4, width = 3, species = "versicolor", .data = .data) | # 34 rows
sepal_config(length = 3, width = 3, species = "virginica", .data = .data) # 21 rows
) # 55 rows
I want to do this without having to pass in .data
, and ideally to also have the column names evaluated in the dataframe scope (i.e., avoiding this error)
sepal_config = function(length, width, species) {
Sepal.Length > length & Sepal.Width < width & Species == species
}
iris %>%
filter(
sepal_config(length = 4, width = 3, species = "versicolor") |
sepal_config(length = 3, width = 3, species = "virginica")
)
Error: Problem with `filter()` input `..1`.
ℹ Input `..1` is `|...`.
x object 'Sepal.Length' not found
Unfortunately I don't understand NSE well enough to know if this is an option. I have tried various techniques from the programming with dplyr how-to guide, but the footnote makes me think I am looking in the wrong place.
dplyr’s
filter()
is inspired by base R’ssubset()
.subset()
provides data masking, but not with tidy evaluation, so the techniques described in this chapter don’t apply to it.
Thanks, Akhil
The filter() function is used to subset a data frame, retaining all rows that satisfy your conditions. To be retained, the row must produce a value of TRUE for all conditions. Note that when a condition evaluates to NA the row will be dropped, unlike base subsetting with [ .
The filter() method in R is used to subset a data frame based on a provided condition. If a row satisfies the condition, it must produce TRUE . Otherwise, non-satisfying rows will return NA values. Hence, the row will be dropped.
dplyr functions use non-standard evaluation. That is why you do not have to quote your variable names when you do something like select(mtcars, mpg) , and why select(mtcars, "mpg") doesn't work. When you use dplyr in functions, you will likely want to use "standard evaluation".
You can wrap the expression in your function with quo()
and use the !!
operator to defuse it in the filter()
call.
library(dplyr)
sepal_config = function(length, width, species) {
quo(Sepal.Length > length & Sepal.Width < width & Species == species)
}
iris %>%
filter(!!sepal_config(length = 4, width = 3, species = "versicolor") |
!!sepal_config(length = 3, width = 3, species = "virginica"))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.5 2.3 4.0 1.3 versicolor
2 6.5 2.8 4.6 1.5 versicolor
3 5.7 2.8 4.5 1.3 versicolor
4 4.9 2.4 3.3 1.0 versicolor
5 6.6 2.9 4.6 1.3 versicolor
6 5.2 2.7 3.9 1.4 versicolor
7 5.0 2.0 3.5 1.0 versicolor
8 6.0 2.2 4.0 1.0 versicolor
9 6.1 2.9 4.7 1.4 versicolor
10 5.6 2.9 3.6 1.3 versicolor
...
dplyr
provides a function cur_data()
for this sort of thing:
library(dplyr, warn.conflicts = FALSE)
sepal_config <- function(data, length, width, species, .data = cur_data()) {
.data$Sepal.Length > length & .data$Sepal.Width < width & .data$Species == species
}
iris %>%
as_tibble() %>%
filter(
sepal_config(length = 4, width = 3, species = "versicolor") | # 34 rows
sepal_config(length = 3, width = 3, species = "virginica") # 21 rows
)
#> # A tibble: 55 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.5 2.3 4 1.3 versicolor
#> 2 6.5 2.8 4.6 1.5 versicolor
#> 3 5.7 2.8 4.5 1.3 versicolor
#> 4 4.9 2.4 3.3 1 versicolor
#> 5 6.6 2.9 4.6 1.3 versicolor
#> 6 5.2 2.7 3.9 1.4 versicolor
#> 7 5 2 3.5 1 versicolor
#> 8 6 2.2 4 1 versicolor
#> 9 6.1 2.9 4.7 1.4 versicolor
#> 10 5.6 2.9 3.6 1.3 versicolor
#> # ... with 45 more rows
Created on 2021-10-12 by the reprex package (v2.0.0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With