Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to filter a data frame programmatically with dplyr and tidy evaluation?

Let's say I want to filter the starwars data frame programmatically. Here's a simple example that lets me filter based on homeworld and species:

library(tidyverse)

# a function that allows the user to supply filters
filter_starwars <- function(filters) {
  for (filter in filters) {
    starwars = filter_at(starwars, filter$var, all_vars(. %in% filter$values))
  }

  return(starwars)
}

# filter Star Wars characters that are human, and from either Tatooine or Alderaan
filter_starwars(filters = list(
  list(var = "homeworld", values = c("Tatooine", "Alderaan")),
  list(var = "species", values = "Human")
))

But this doesn't let me specify, say, a height filter, because I've hard-coded the %in% operator in the .vars_predicate of filter_at(), and a height filter would use one of the >, >=, <, <=, or == operators

What is the best way to write the filter_starwars() function so that the user can supply filters that are general enough to filter along any column and use any operator?

NB using the now-deprecated filter_() method, I could pass a string:

filter_(starwars, "species == 'Human' & homeworld %in% c('Tatooine', 'Alderaan') & height > 175")

But again, that has been deprecated.

like image 409
tws Avatar asked Jul 16 '17 23:07

tws


2 Answers

Here are some approaches.

1) For this particular problem we don't actually need filter_, rlang or similar. This works:

filter_starwars <- function(...) {
    filter(starwars, ...)
}

# test
filter_starwars(species == 'Human', 
                homeworld %in% c('Tatooine', 'Alderaan'), 
                height > 175)
)

2) If it is important to have character arguments then:

library(rlang)

filter_starwars <- function(...) {
    filter(starwars, !!!parse_exprs(paste(..., sep = ";")))
}

# test
filter_starwars("species == 'Human'", 
                "homeworld %in% c('Tatooine', 'Alderaan')", 
                "height > 175")

2a) or if a single character vector is to be passed:

library(rlang)

filter_starwars <- function(filters) {
    filter(starwars, !!!parse_exprs(paste(filters, collapse = ";")))
}

# test 
filter_starwars(c("species == 'Human'", 
                  "homeworld %in% c('Tatooine', 'Alderaan')", 
                  "height > 175"))
like image 83
G. Grothendieck Avatar answered Nov 01 '22 07:11

G. Grothendieck


Try

filter_starwars <- function(...) {
  F <- quos(...)
  filter(starwars, !!!F)
}

filter_starwars(species == 'Human', homeworld %in% c('Tatooine', 'Alderaan'), height > 175)
# # A tibble: 7 × 13
#                  name height  mass  hair_color skin_color eye_color birth_year
#                 <chr>  <int> <dbl>       <chr>      <chr>     <chr>      <dbl>
# 1         Darth Vader    202   136        none      white    yellow       41.9
# 2           Owen Lars    178   120 brown, grey      light      blue       52.0
# 3   Biggs Darklighter    183    84       black      light     brown       24.0
# 4    Anakin Skywalker    188    84       blond       fair      blue       41.9
# 5         Cliegg Lars    183    NA       brown       fair      blue       82.0
# 6 Bail Prestor Organa    191    NA       black        tan     brown       67.0
# 7     Raymus Antilles    188    79       brown      light     brown         NA
# # ... with 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
# #   films <list>, vehicles <list>, starships <list>

See https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html. Briefly, quos captures ... as a list, without evaluating the arguments. !!! splices and unquotes the arguments for evaluation in filter().

like image 14
Weihuang Wong Avatar answered Nov 01 '22 06:11

Weihuang Wong