Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr filter with condition on multiple columns

Tags:

r

dplyr

I'd like to remove rows corresponding to a particular combination of variables from my data frame.

Here's a dummy data :

father<- c(1, 1, 1, 1, 1) mother<- c(1, 1, 1, NA, NA)  children <- c(NA, NA, 2, 5, 2)  cousins   <- c(NA, 5, 1, 1, 4)    dataset <- data.frame(father, mother, children, cousins)   dataset   father  mother  children cousins 1      1       NA      NA 1      1       NA       5 1      1        2       1 1     NA        5       1 1     NA        2       4 

I want to filter this row :

  father  mother  children cousins     1      1       NA      NA 

I can do it with :

test <- dataset %>%  filter(father==1 & mother==1) %>% filter (is.na(children)) %>% filter (is.na(cousins)) test   

My question : I have many columns like grand father, uncle1, uncle2, uncle3 and I want to avoid something like that:

  filter (is.na(children)) %>%   filter (is.na(cousins)) %>%   filter (is.na(uncle1)) %>%   filter (is.na(uncle2)) %>%   filter (is.na(uncle3))    and so on... 

How can I use dplyr to say filter all the column with na (except father==1 & mother==1)

like image 440
Wilcar Avatar asked May 12 '17 13:05

Wilcar


People also ask

How do I filter multiple values in a column in R?

In this, first, pass your dataframe object to the filter function, then in the condition parameter write the column name in which you want to filter multiple values then put the %in% operator, and then pass a vector containing all the string values which you want in the result.

How do I filter a Dataframe in multiple conditions in R?

Multiple conditions can also be combined using which() method in R. The which() function in R returns the position of the value which satisfies the given condition. The %in% operator is used to check a value in the vector specified.

What does %>% do in dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).


2 Answers

A possible dplyr(0.5.0.9004 <= version < 1.0) solution is:

# > packageVersion('dplyr') # [1] ‘0.5.0.9004’  dataset %>%     filter(!is.na(father), !is.na(mother)) %>%     filter_at(vars(-father, -mother), all_vars(is.na(.))) 

Explanation:

  • vars(-father, -mother): select all columns except father and mother.
  • all_vars(is.na(.)): keep rows where is.na is TRUE for all the selected columns.

note: any_vars should be used instead of all_vars if rows where is.na is TRUE for any column are to be kept.


Update (2020-11-28)

As the _at functions and vars have been superseded by the use of across since dplyr 1.0, the following way (or similar) is recommended now:

dataset %>%     filter(across(c(father, mother), ~ !is.na(.x))) %>%     filter(across(c(-father, -mother), is.na)) 

See more example of across and how to rewrite previous code with the new approach here: Colomn-wise operatons or type vignette("colwise") in R after installing the latest version of dplyr.

like image 136
mt1022 Avatar answered Sep 21 '22 01:09

mt1022


dplyr >= 1.0.4

If you're using dplyr version >= 1.0.4 you really should use if_any or if_all, which specifically combines the results of the predicate function into a single logical vector making it very useful in filter. The syntax is identical to across, but these verbs were added to help fill this need: if_any/if_all.

library(dplyr)  dataset %>%    filter(if_all(-c(father, mother), ~ is.na(.)), if_all(c(father, mother), ~ !is.na(.))) 

Output

  father mother children cousins 1      1      1       NA      NA 
like image 44
LMc Avatar answered Sep 21 '22 01:09

LMc