In dplyr 1.0.0, what is the right way to write a logical disjunction?

Tags:

I am currently in the process of tiding the way in write my R scripts, so I am not really looking for an answer outside the tidyverse or using deprecated / superseded syntaxes. I find dplyr's way of manipulating data easy to write and read, so I try to stick to it.

Using the iris dataset, here is a simplified version of what I want to do, in the superseded syntax (which works fine):

filter_at(iris, vars(starts_with("sepal")), any_vars(. > 3))

Obviously, I could write the condition in the long form to avoid using filter_at() and any_vars() :

filter(iris, Sepal.Length > 3 | Sepal.Width > 3)

but it is redundant, and mostly, if like in my case the column names are not known fully, not applicable.

In dplyr's vignette("colwise"), it is stated:

Previously, filter() was paired with the all_vars() and any_vars() helpers. Now, across() is equivalent to all_vars(), and there’s no direct replacement for any_vars(). However you can make a simple helper yourself:

followed by a super trivial example (any value > 0, so we only need using rowSums()). I feel like it's lacking a disjunctive version of across() in the specific case of filtering to maintain the same expressivity.

In your opinion, what would be the cleanest syntax to achieve the same filtering without having to enumerate all the columns or to use superseded functions?

689

asked Jul 02 '20 19:07

marika

2 Answers

We can use filter with across with reduce

library(dplyr)
library(purrr)
iris %>% 
    filter(across(starts_with("sepal"), ~ . > 5) %>% reduce(`|`))
#  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
#1            5.1         3.5          1.4         0.2     setosa
#2            5.4         3.9          1.7         0.4     setosa
#3            5.4         3.7          1.5         0.2     setosa
#4            5.8         4.0          1.2         0.2     setosa
#5            5.7         4.4          1.5         0.4     setosa
#6            5.4         3.9          1.3         0.4     setosa
#7            5.1         3.5          1.4         0.3     setosa
# ...

answered Oct 17 '22 04:10

akrun

Is this what you're looking for? Here we include any rows where either Sepal.Length or Sepal.Width is greater than 3.

c_across takes the specified columns and treats each row of those variables as a vector, iterating one row at a time. So, you can perform rowwise filtering by checking if any of the specified columns in the row are greater than 3.

library(dplyr)

iris %>%
  rowwise() %>%
  filter(any(c_across(starts_with("sepal")) > 5))
#> # A tibble: 118 x 5
#> # Rowwise: 
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          5.4         3.9          1.7         0.4 setosa 
#>  3          5.4         3.7          1.5         0.2 setosa 
#>  4          5.8         4            1.2         0.2 setosa 
#>  5          5.7         4.4          1.5         0.4 setosa 
#>  6          5.4         3.9          1.3         0.4 setosa 
#>  7          5.1         3.5          1.4         0.3 setosa 
#>  8          5.7         3.8          1.7         0.3 setosa 
#>  9          5.1         3.8          1.5         0.3 setosa 
#> 10          5.4         3.4          1.7         0.2 setosa 
#> # … with 108 more rows

^{Created on 2020-07-02 by the reprex package (v0.3.0)}

answered Oct 17 '22 05:10

RyanFrost

Related questions
                            
                                Change line type of border
                            
                                How do you add track label in R Circlize?
                            
                                How to change colour and position of geom_text for just one bar in a barplot in ggplot2 (R)?
                            
                                Difference in outputs using cumsum
                            
                                Make rbindlist skip, ignore or change class attribute of the column
                            
                                Generating distinct groups of nodes in a network
                            
                                Linking to another post in blogdown
                            
                                What is the most efficient way to paste strings in R?
                            
                                Find all packages that depend on a specific one
                            
                                Problem with adist function in text comparison
                            
                                How to suppress warnings from stats:::regularize.values?
                            
                                curly curly Tidy evaluation and modifying inputs or their names
                            
                                AWS Forecast. Too few observations for number of items
                            
                                Recover Rcpp source file
                            
                                Referring to package and function as arguments in another function
                            
                                ggplot2 fails to load, with 'rlang' package error
                            
                                Assigning plot to a variable in a loop
                            
                                Using colMeans in Rcpp
                            
                                Can we keep the caption at the top of plotly objects in html output from rmarkdown?
                            
                                Use R to Efficiently Order Randomly Generated Transects

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With