I'd like to remove rows corresponding to a particular combination of variables from my data frame. Here's a dummy data : <pre class="prettyprint"><code>father<- c(1, 1, 1, 1, 1) mother<- c(1, 1, 1, NA, NA) children <- c(NA, NA, 2, 5, 2) cousins <- c(NA, 5, 1, 1, 4) dataset <- data.frame(father, mother, children, cousins) dataset father mother children cousins 1 1 NA NA 1 1 NA 5 1 1 2 1 1 NA 5 1 1 NA 2 4 </code></pre> I want to filter this row : <pre class="prettyprint"><code> father mother children cousins 1 1 NA NA </code></pre> I can do it with : <pre class="prettyprint"><code>test <- dataset %>% filter(father==1 & mother==1) %>% filter (is.na(children)) %>% filter (is.na(cousins)) test </code></pre> My question : I have many columns like grand father, uncle1, uncle2, uncle3 and I want to avoid something like that: <pre class="prettyprint"><code> filter (is.na(children)) %>% filter (is.na(cousins)) %>% filter (is.na(uncle1)) %>% filter (is.na(uncle2)) %>% filter (is.na(uncle3)) and so on... </code></pre> How can I use dplyr to say filter all the column with na (except father==1 & mother==1)

A possible <code>dplyr</code>(0.5.0.9004 <= version < 1.0) solution is: <pre class="prettyprint"><code># > packageVersion('dplyr') # [1] ‘0.5.0.9004’ dataset %>% filter(!is.na(father), !is.na(mother)) %>% filter_at(vars(-father, -mother), all_vars(is.na(.))) </code></pre> Explanation: <ul> <li> <code>vars(-father, -mother)</code>: select all columns except <code>father</code> and <code>mother</code>.</li> <li> <code>all_vars(is.na(.))</code>: keep rows where <code>is.na</code> is <code>TRUE</code> for all the selected columns.</li> </ul> note: <code>any_vars</code> should be used instead of <code>all_vars</code> if rows where <code>is.na</code> is <code>TRUE</code> for any column are to be kept. <hr> Update (2020-11-28) As the <code>_at</code> functions and <code>vars</code> have been superseded by the use of <code>across</code> since dplyr 1.0, the following way (or similar) is recommended now: <pre class="prettyprint"><code>dataset %>% filter(across(c(father, mother), ~ !is.na(.x))) %>% filter(across(c(-father, -mother), is.na)) </code></pre> See more example of <code>across</code> and how to rewrite previous code with the new approach here: Colomn-wise operatons or type <code>vignette("colwise")</code> in R after installing the latest version of <code>dplyr</code>.

<h3>dplyr >= 1.0.4</h3> If you're using dplyr version >= 1.0.4 you really should use <code>if_any</code> or <code>if_all</code>, which specifically combines the results of the predicate function into a single logical vector making it very useful in <code>filter</code>. The syntax is identical to <code>across</code>, but these verbs were added to help fill this need: if_any/if_all. <pre class="prettyprint"><code>library(dplyr) dataset %>% filter(if_all(-c(father, mother), ~ is.na(.)), if_all(c(father, mother), ~ !is.na(.))) </code></pre> Output <pre class="prettyprint"><code> father mother children cousins 1 1 1 NA NA </code></pre>

dplyr filter with condition on multiple columns

Tags:

r

dplyr

I'd like to remove rows corresponding to a particular combination of variables from my data frame.

Here's a dummy data :

father<- c(1, 1, 1, 1, 1) mother<- c(1, 1, 1, NA, NA)  children <- c(NA, NA, 2, 5, 2)  cousins   <- c(NA, 5, 1, 1, 4)    dataset <- data.frame(father, mother, children, cousins)   dataset   father  mother  children cousins 1      1       NA      NA 1      1       NA       5 1      1        2       1 1     NA        5       1 1     NA        2       4

I want to filter this row :

  father  mother  children cousins     1      1       NA      NA

I can do it with :

test <- dataset %>%  filter(father==1 & mother==1) %>% filter (is.na(children)) %>% filter (is.na(cousins)) test

My question : I have many columns like grand father, uncle1, uncle2, uncle3 and I want to avoid something like that:

  filter (is.na(children)) %>%   filter (is.na(cousins)) %>%   filter (is.na(uncle1)) %>%   filter (is.na(uncle2)) %>%   filter (is.na(uncle3))    and so on...

How can I use dplyr to say filter all the column with na (except father==1 & mother==1)

440

asked May 12 '17 13:05

Wilcar

2 Answers

A possible dplyr(0.5.0.9004 <= version < 1.0) solution is:

# > packageVersion('dplyr') # [1] ‘0.5.0.9004’  dataset %>%     filter(!is.na(father), !is.na(mother)) %>%     filter_at(vars(-father, -mother), all_vars(is.na(.)))

Explanation:

vars(-father, -mother): select all columns except father and mother.
all_vars(is.na(.)): keep rows where is.na is TRUE for all the selected columns.

note: any_vars should be used instead of all_vars if rows where is.na is TRUE for any column are to be kept.

Update (2020-11-28)

As the _at functions and vars have been superseded by the use of across since dplyr 1.0, the following way (or similar) is recommended now:

dataset %>%     filter(across(c(father, mother), ~ !is.na(.x))) %>%     filter(across(c(-father, -mother), is.na))

See more example of across and how to rewrite previous code with the new approach here: Colomn-wise operatons or type vignette("colwise") in R after installing the latest version of dplyr.

136

answered Sep 21 '22 01:09

mt1022

dplyr >= 1.0.4

If you're using dplyr version >= 1.0.4 you really should use if_any or if_all, which specifically combines the results of the predicate function into a single logical vector making it very useful in filter. The syntax is identical to across, but these verbs were added to help fill this need: if_any/if_all.

library(dplyr)  dataset %>%    filter(if_all(-c(father, mother), ~ is.na(.)), if_all(c(father, mother), ~ !is.na(.)))

Output

  father mother children cousins 1      1      1       NA      NA

answered Sep 21 '22 01:09

LMc

Related questions
                            
                                how to drop columns by passing variable name with dplyr?
                            
                                ROC curve from training data in caret
                            
                                How to assign output of cat to an object?
                            
                                How to use a variable in dplyr::filter?
                            
                                How to import a .tsv file
                            
                                Remove accents from a dataframe column in R
                            
                                Error when I try to predict class probabilities in R - caret
                            
                                How to write from R to the clipboard on a mac
                            
                                Is there a way to check if a column is a Date in R?
                            
                                Draw more than one function curves in the same plot [duplicate]
                            
                                Frequency count of two column in R
                            
                                Extract Links from Webpage using R
                            
                                How to create a column with a quartile rank?
                            
                                Run multiple R-scripts simultaneously
                            
                                Rmarkdown font size and header
                            
                                How to maintain size of ggplot with long labels
                            
                                Moving variance in R
                            
                                How can I extract elements from lists of lists in R?
                            
                                How do you change library location in R? [duplicate]
                            
                                Efficient way to find repeated runs of rows, remove, & count

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With