Sometimes I want to view all rows in a data frame that will be dropped if I drop all rows that have a missing value for any variable. In this case, I'm specifically interested in how to do this with <code>dplyr</code> 1.0's <code>across()</code> function used inside of the <code>filter()</code> verb. Here is an example data frame: <pre class="prettyprint"><code>df <- tribble( ~id, ~x, ~y, 1, 1, 0, 2, 1, 1, 3, NA, 1, 4, 0, 0, 5, 1, NA ) </code></pre> Code for keeping rows that DO NOT include any missing values is provided on the tidyverse website. Specifically, I can use: <pre class="prettyprint"><code>df %>% filter( across( .cols = everything(), .fns = ~ !is.na(.x) ) ) </code></pre> Which returns: <pre class="prettyprint"><code># A tibble: 3 x 3 id x y <dbl> <dbl> <dbl> 1 1 1 0 2 2 1 1 3 4 0 0 </code></pre> However, I can't figure out how to return the opposite -- rows with a missing value in any variable. The result I'm looking for is: <pre class="prettyprint"><code># A tibble: 2 x 3 id x y <dbl> <dbl> <dbl> 1 3 NA 1 2 5 1 NA </code></pre> My first thought was just to remove the <code>!</code>: <pre class="prettyprint"><code>df %>% filter( across( .cols = everything(), .fns = ~ is.na(.x) ) ) </code></pre> But, that returns zero rows. Of course, I can get the answer I want with this code if I know all variables that have a missing value ahead of time: <pre class="prettyprint"><code>df %>% filter(is.na(x) | is.na(y)) </code></pre> But, I'm looking for a solution that doesn't require me to know which variables have a missing value ahead of time. Additionally, I'm aware of how to do this with the <code>filter_all()</code> function: <pre class="prettyprint"><code>df %>% filter_all(any_vars(is.na(.))) </code></pre> But, the <code>filter_all()</code> function has been superseded by the use of <code>across()</code> in an existing verb. See https://dplyr.tidyverse.org/articles/colwise.html Other unsuccessful attempts I've made are: <pre class="prettyprint"><code>df %>% filter( across( .cols = everything(), .fns = ~any_vars(is.na(.x)) ) ) df %>% filter( across( .cols = everything(), .fns = ~!!any_vars(is.na(.x)) ) ) df %>% filter( across( .cols = everything(), .fns = ~!!any_vars(is.na(.)) ) ) df %>% filter( across( .cols = everything(), .fns = ~any(is.na(.x)) ) ) df %>% filter( across( .cols = everything(), .fns = ~any(is.na(.)) ) ) </code></pre>

It's now possible with <code>dplyr</code> 1.0.4. The new <code>if_any()</code> replaces <code>across()</code> for the filtering use-case. <pre class="prettyprint lang-r prettyprint-override"><code>library(dplyr) df <- tribble(~ id, ~ x, ~ y, 1, 1, 0, 2, 1, 1, 3, NA, 1, 4, 0, 0, 5, 1, NA) df %>% filter(if_any(everything(), is.na)) #> # A tibble: 2 x 3 #> id x y #> <dbl> <dbl> <dbl> #> 1 3 NA 1 #> 2 5 1 NA </code></pre> Created on 2021-02-10 by the reprex package (v0.3.0) See here for more details: https://www.tidyverse.org/blog/2021/02/dplyr-1-0-4-if-any/

We can use <code>reduce</code> <pre class="prettyprint"><code>library(dplyr) library(purrr) df %>% filter(across(everything(), is.na) %>% reduce(`|`)) # A tibble: 2 x 3 # id x y # <dbl> <dbl> <dbl> #1 3 NA 1 #2 5 1 NA </code></pre>

Using filter() with across() to keep all rows of a data frame that include a missing value for any variable

Tags:

Sometimes I want to view all rows in a data frame that will be dropped if I drop all rows that have a missing value for any variable. In this case, I'm specifically interested in how to do this with dplyr 1.0's across() function used inside of the filter() verb.

Here is an example data frame:

df <- tribble(   ~id, ~x, ~y,   1, 1, 0,   2, 1, 1,   3, NA, 1,   4, 0, 0,   5, 1, NA )

Code for keeping rows that DO NOT include any missing values is provided on the tidyverse website. Specifically, I can use:

df %>%    filter(     across(       .cols = everything(),       .fns = ~ !is.na(.x)     )   )

Which returns:

# A tibble: 3 x 3      id     x     y   <dbl> <dbl> <dbl> 1     1     1     0 2     2     1     1 3     4     0     0

However, I can't figure out how to return the opposite -- rows with a missing value in any variable. The result I'm looking for is:

# A tibble: 2 x 3      id     x     y   <dbl> <dbl> <dbl> 1     3    NA     1 2     5     1    NA

My first thought was just to remove the !:

df %>%    filter(     across(       .cols = everything(),       .fns = ~ is.na(.x)     )   )

But, that returns zero rows.

Of course, I can get the answer I want with this code if I know all variables that have a missing value ahead of time:

df %>%    filter(is.na(x) | is.na(y))

But, I'm looking for a solution that doesn't require me to know which variables have a missing value ahead of time. Additionally, I'm aware of how to do this with the filter_all() function:

df %>%    filter_all(any_vars(is.na(.)))

But, the filter_all() function has been superseded by the use of across() in an existing verb. See https://dplyr.tidyverse.org/articles/colwise.html

Other unsuccessful attempts I've made are:

df %>%    filter(     across(       .cols = everything(),       .fns = ~any_vars(is.na(.x))     )   )  df %>%    filter(     across(       .cols = everything(),       .fns = ~!!any_vars(is.na(.x))     )   )  df %>%    filter(     across(       .cols = everything(),       .fns = ~!!any_vars(is.na(.))     )   )  df %>%    filter(     across(       .cols = everything(),       .fns = ~any(is.na(.x))     )   )  df %>%    filter(     across(       .cols = everything(),       .fns = ~any(is.na(.))     )   )

605

asked Jun 02 '20 21:06

Brad Cannell

2 Answers

It's now possible with dplyr 1.0.4. The new if_any() replaces across() for the filtering use-case.

library(dplyr)  df <- tribble(~ id, ~ x, ~ y,               1, 1, 0,               2, 1, 1,               3, NA, 1,               4, 0, 0,               5, 1, NA)  df %>%    filter(if_any(everything(), is.na)) #> # A tibble: 2 x 3 #>      id     x     y #>   <dbl> <dbl> <dbl> #> 1     3    NA     1 #> 2     5     1    NA

^{Created on 2021-02-10 by the reprex package (v0.3.0)}

See here for more details: https://www.tidyverse.org/blog/2021/02/dplyr-1-0-4-if-any/

answered Sep 18 '22 15:09

Emman

We can use reduce

library(dplyr) library(purrr) df %>%        filter(across(everything(), is.na) %>% reduce(`|`)) # A tibble: 2 x 3 #     id     x     y #  <dbl> <dbl> <dbl> #1     3    NA     1 #2     5     1    NA

answered Sep 18 '22 15:09

akrun

Related questions
                            
                                Split list recursively until flat
                            
                                What is the difference between MAUI and Uno Platform?
                            
                                Upgrading to Jetpack Compose Alpha 12 causes errors on setContent
                            
                                Abstract Factory Design Pattern
                            
                                Tips and tricks for working with Microsoft Visual Studio solutions and project [closed]
                            
                                Convert this delegate to an anonymous method or lambda
                            
                                How do I recycle an IIS AppPool with Powershell?
                            
                                How do I enable a second monitor in C#?
                            
                                Export to Excel in Asp.net MVC [closed]
                            
                                How do you dispose of an IDisposable in Managed C++?
                            
                                Define a preprocessor value from command line using MSBuild [duplicate]
                            
                                Learning Applied Statistics with a focus on R [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With