I am currently in the process of tiding the way in write my R scripts, so I am not really looking for an answer outside the tidyverse or using deprecated / superseded syntaxes. I find dplyr's way of manipulating data easy to write and read, so I try to stick to it.
Using the iris dataset, here is a simplified version of what I want to do, in the superseded syntax (which works fine):
filter_at(iris, vars(starts_with("sepal")), any_vars(. > 3))
Obviously, I could write the condition in the long form to avoid using filter_at()
and any_vars()
:
filter(iris, Sepal.Length > 3 | Sepal.Width > 3)
but it is redundant, and mostly, if like in my case the column names are not known fully, not applicable.
In dplyr's vignette("colwise"), it is stated:
Previously, filter() was paired with the all_vars() and any_vars() helpers. Now, across() is equivalent to all_vars(), and there’s no direct replacement for any_vars(). However you can make a simple helper yourself:
followed by a super trivial example (any value > 0, so we only need using rowSums()
). I feel like it's lacking a disjunctive version of across()
in the specific case of filtering to maintain the same expressivity.
In your opinion, what would be the cleanest syntax to achieve the same filtering without having to enumerate all the columns or to use superseded functions?
disjunction, in logic, relation or connection of terms in a proposition to express the concept “or”; it is a statement of alternatives (sometimes called “alternation”).
Logical disjunction, also called logical alternation, is an operation on two logical values, typically the values of two propositions, that produces a value of false if and only if both of its operands are false.
In short, the new function across() operates across multiple columns and multiple functions within existing dplyr verbs such as summarise() or mutate() . This makes it extremely powerful and time-saving. There is now no longer any need for the scoped variants such as summarise_at() , mutate_if() , etc.
We can use filter
with across
with reduce
library(dplyr)
library(purrr)
iris %>%
filter(across(starts_with("sepal"), ~ . > 5) %>% reduce(`|`))
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 5.1 3.5 1.4 0.2 setosa
#2 5.4 3.9 1.7 0.4 setosa
#3 5.4 3.7 1.5 0.2 setosa
#4 5.8 4.0 1.2 0.2 setosa
#5 5.7 4.4 1.5 0.4 setosa
#6 5.4 3.9 1.3 0.4 setosa
#7 5.1 3.5 1.4 0.3 setosa
# ...
Is this what you're looking for? Here we include any rows where either Sepal.Length or Sepal.Width is greater than 3.
c_across
takes the specified columns and treats each row of those variables as a vector, iterating one row at a time. So, you can perform rowwise filtering by checking if any of the specified columns in the row are greater than 3.
library(dplyr)
iris %>%
rowwise() %>%
filter(any(c_across(starts_with("sepal")) > 5))
#> # A tibble: 118 x 5
#> # Rowwise:
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 5.4 3.9 1.7 0.4 setosa
#> 3 5.4 3.7 1.5 0.2 setosa
#> 4 5.8 4 1.2 0.2 setosa
#> 5 5.7 4.4 1.5 0.4 setosa
#> 6 5.4 3.9 1.3 0.4 setosa
#> 7 5.1 3.5 1.4 0.3 setosa
#> 8 5.7 3.8 1.7 0.3 setosa
#> 9 5.1 3.8 1.5 0.3 setosa
#> 10 5.4 3.4 1.7 0.2 setosa
#> # … with 108 more rows
Created on 2020-07-02 by the reprex package (v0.3.0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With