Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using a function in dplyr filter

Tags:

r

filter

dplyr

I'd like to define a helper function to help me compose some boolean filters more clearly.

This is a working example of the result using the iris dataset

library(tidyverse)


sepal_config = function(length, width, species, .data) {
  .data$Sepal.Length > length & .data$Sepal.Width < width & .data$Species == species
}

iris %>% 
  filter(
      sepal_config(length = 4, width = 3, species = "versicolor", .data = .data) |  # 34 rows
      sepal_config(length = 3, width = 3, species = "virginica",  .data = .data)    # 21 rows
    )                                                                               # 55 rows

I want to do this without having to pass in .data, and ideally to also have the column names evaluated in the dataframe scope (i.e., avoiding this error)

sepal_config = function(length, width, species) {
  Sepal.Length > length & Sepal.Width < width & Species == species
}

iris %>% 
  filter(
      sepal_config(length = 4, width = 3, species = "versicolor") |
      sepal_config(length = 3, width = 3, species = "virginica")
    )                                                               
Error: Problem with `filter()` input `..1`.
ℹ Input `..1` is `|...`.
x object 'Sepal.Length' not found

Unfortunately I don't understand NSE well enough to know if this is an option. I have tried various techniques from the programming with dplyr how-to guide, but the footnote makes me think I am looking in the wrong place.

dplyr’s filter() is inspired by base R’s subset(). subset() provides data masking, but not with tidy evaluation, so the techniques described in this chapter don’t apply to it.

Thanks, Akhil

like image 975
Akhil Nair Avatar asked Oct 12 '21 09:10

Akhil Nair


People also ask

What is filter function in dplyr?

The filter() function is used to subset a data frame, retaining all rows that satisfy your conditions. To be retained, the row must produce a value of TRUE for all conditions. Note that when a condition evaluates to NA the row will be dropped, unlike base subsetting with [ .

What is filter function in R?

The filter() method in R is used to subset a data frame based on a provided condition. If a row satisfies the condition, it must produce TRUE . Otherwise, non-satisfying rows will return NA values. Hence, the row will be dropped.

Can you use dplyr in a function?

dplyr functions use non-standard evaluation. That is why you do not have to quote your variable names when you do something like select(mtcars, mpg) , and why select(mtcars, "mpg") doesn't work. When you use dplyr in functions, you will likely want to use "standard evaluation".


2 Answers

You can wrap the expression in your function with quo() and use the !! operator to defuse it in the filter() call.

library(dplyr)

sepal_config = function(length, width, species) {
  quo(Sepal.Length > length & Sepal.Width < width & Species == species)
  }

iris %>% 
  filter(!!sepal_config(length = 4, width = 3, species = "versicolor") |
         !!sepal_config(length = 3, width = 3, species = "virginica"))


   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1           5.5         2.3          4.0         1.3 versicolor
2           6.5         2.8          4.6         1.5 versicolor
3           5.7         2.8          4.5         1.3 versicolor
4           4.9         2.4          3.3         1.0 versicolor
5           6.6         2.9          4.6         1.3 versicolor
6           5.2         2.7          3.9         1.4 versicolor
7           5.0         2.0          3.5         1.0 versicolor
8           6.0         2.2          4.0         1.0 versicolor
9           6.1         2.9          4.7         1.4 versicolor
10          5.6         2.9          3.6         1.3 versicolor
...
like image 84
Ritchie Sacramento Avatar answered Nov 11 '22 06:11

Ritchie Sacramento


dplyr provides a function cur_data() for this sort of thing:

library(dplyr, warn.conflicts = FALSE)

sepal_config <- function(data, length, width, species, .data = cur_data()) {
  .data$Sepal.Length > length & .data$Sepal.Width < width & .data$Species == species
}

iris %>% 
  as_tibble() %>% 
  filter(
    sepal_config(length = 4, width = 3, species = "versicolor") |  # 34 rows
      sepal_config(length = 3, width = 3, species = "virginica")    # 21 rows
  )     
#> # A tibble: 55 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species   
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>     
#>  1          5.5         2.3          4           1.3 versicolor
#>  2          6.5         2.8          4.6         1.5 versicolor
#>  3          5.7         2.8          4.5         1.3 versicolor
#>  4          4.9         2.4          3.3         1   versicolor
#>  5          6.6         2.9          4.6         1.3 versicolor
#>  6          5.2         2.7          3.9         1.4 versicolor
#>  7          5           2            3.5         1   versicolor
#>  8          6           2.2          4           1   versicolor
#>  9          6.1         2.9          4.7         1.4 versicolor
#> 10          5.6         2.9          3.6         1.3 versicolor
#> # ... with 45 more rows

Created on 2021-10-12 by the reprex package (v2.0.0)

like image 41
wurli Avatar answered Nov 11 '22 05:11

wurli