Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In dplyr 1.0.0, what is the right way to write a logical disjunction?

Tags:

r

dplyr

I am currently in the process of tiding the way in write my R scripts, so I am not really looking for an answer outside the tidyverse or using deprecated / superseded syntaxes. I find dplyr's way of manipulating data easy to write and read, so I try to stick to it.

Using the iris dataset, here is a simplified version of what I want to do, in the superseded syntax (which works fine):

filter_at(iris, vars(starts_with("sepal")), any_vars(. > 3))

Obviously, I could write the condition in the long form to avoid using filter_at() and any_vars() :

filter(iris, Sepal.Length > 3 | Sepal.Width > 3)

but it is redundant, and mostly, if like in my case the column names are not known fully, not applicable.

In dplyr's vignette("colwise"), it is stated:

Previously, filter() was paired with the all_vars() and any_vars() helpers. Now, across() is equivalent to all_vars(), and there’s no direct replacement for any_vars(). However you can make a simple helper yourself:

followed by a super trivial example (any value > 0, so we only need using rowSums()). I feel like it's lacking a disjunctive version of across() in the specific case of filtering to maintain the same expressivity.

In your opinion, what would be the cleanest syntax to achieve the same filtering without having to enumerate all the columns or to use superseded functions?

like image 689
marika Avatar asked Jul 02 '20 19:07

marika


People also ask

What is disjunction set?

disjunction, in logic, relation or connection of terms in a proposition to express the concept “or”; it is a statement of alternatives (sometimes called “alternation”).

What is Boolean disjunction?

Logical disjunction, also called logical alternation, is an operation on two logical values, typically the values of two propositions, that produces a value of false if and only if both of its operands are false.

What is new in Dplyr?

In short, the new function across() operates across multiple columns and multiple functions within existing dplyr verbs such as summarise() or mutate() . This makes it extremely powerful and time-saving. There is now no longer any need for the scoped variants such as summarise_at() , mutate_if() , etc.


2 Answers

We can use filter with across with reduce

library(dplyr)
library(purrr)
iris %>% 
    filter(across(starts_with("sepal"), ~ . > 5) %>% reduce(`|`))
#  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
#1            5.1         3.5          1.4         0.2     setosa
#2            5.4         3.9          1.7         0.4     setosa
#3            5.4         3.7          1.5         0.2     setosa
#4            5.8         4.0          1.2         0.2     setosa
#5            5.7         4.4          1.5         0.4     setosa
#6            5.4         3.9          1.3         0.4     setosa
#7            5.1         3.5          1.4         0.3     setosa
# ...
like image 52
akrun Avatar answered Oct 17 '22 04:10

akrun


Is this what you're looking for? Here we include any rows where either Sepal.Length or Sepal.Width is greater than 3.

c_across takes the specified columns and treats each row of those variables as a vector, iterating one row at a time. So, you can perform rowwise filtering by checking if any of the specified columns in the row are greater than 3.

library(dplyr)

iris %>%
  rowwise() %>%
  filter(any(c_across(starts_with("sepal")) > 5))
#> # A tibble: 118 x 5
#> # Rowwise: 
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          5.4         3.9          1.7         0.4 setosa 
#>  3          5.4         3.7          1.5         0.2 setosa 
#>  4          5.8         4            1.2         0.2 setosa 
#>  5          5.7         4.4          1.5         0.4 setosa 
#>  6          5.4         3.9          1.3         0.4 setosa 
#>  7          5.1         3.5          1.4         0.3 setosa 
#>  8          5.7         3.8          1.7         0.3 setosa 
#>  9          5.1         3.8          1.5         0.3 setosa 
#> 10          5.4         3.4          1.7         0.2 setosa 
#> # … with 108 more rows

Created on 2020-07-02 by the reprex package (v0.3.0)

like image 36
RyanFrost Avatar answered Oct 17 '22 05:10

RyanFrost