I'd like to remove rows corresponding to a particular combination of variables from my data frame.
Here's a dummy data :
father<- c(1, 1, 1, 1, 1) mother<- c(1, 1, 1, NA, NA) children <- c(NA, NA, 2, 5, 2) cousins <- c(NA, 5, 1, 1, 4) dataset <- data.frame(father, mother, children, cousins) dataset father mother children cousins 1 1 NA NA 1 1 NA 5 1 1 2 1 1 NA 5 1 1 NA 2 4
I want to filter this row :
father mother children cousins 1 1 NA NA
I can do it with :
test <- dataset %>% filter(father==1 & mother==1) %>% filter (is.na(children)) %>% filter (is.na(cousins)) test
My question : I have many columns like grand father, uncle1, uncle2, uncle3 and I want to avoid something like that:
filter (is.na(children)) %>% filter (is.na(cousins)) %>% filter (is.na(uncle1)) %>% filter (is.na(uncle2)) %>% filter (is.na(uncle3)) and so on...
How can I use dplyr to say filter all the column with na (except father==1 & mother==1)
In this, first, pass your dataframe object to the filter function, then in the condition parameter write the column name in which you want to filter multiple values then put the %in% operator, and then pass a vector containing all the string values which you want in the result.
Multiple conditions can also be combined using which() method in R. The which() function in R returns the position of the value which satisfies the given condition. The %in% operator is used to check a value in the vector specified.
%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).
A possible dplyr
(0.5.0.9004 <= version < 1.0) solution is:
# > packageVersion('dplyr') # [1] ‘0.5.0.9004’ dataset %>% filter(!is.na(father), !is.na(mother)) %>% filter_at(vars(-father, -mother), all_vars(is.na(.)))
Explanation:
vars(-father, -mother)
: select all columns except father
and mother
.all_vars(is.na(.))
: keep rows where is.na
is TRUE
for all the selected columns.note: any_vars
should be used instead of all_vars
if rows where is.na
is TRUE
for any column are to be kept.
Update (2020-11-28)
As the _at
functions and vars
have been superseded by the use of across
since dplyr 1.0, the following way (or similar) is recommended now:
dataset %>% filter(across(c(father, mother), ~ !is.na(.x))) %>% filter(across(c(-father, -mother), is.na))
See more example of across
and how to rewrite previous code with the new approach here: Colomn-wise operatons or type vignette("colwise")
in R after installing the latest version of dplyr
.
If you're using dplyr version >= 1.0.4 you really should use if_any
or if_all
, which specifically combines the results of the predicate function into a single logical vector making it very useful in filter
. The syntax is identical to across
, but these verbs were added to help fill this need: if_any/if_all.
library(dplyr) dataset %>% filter(if_all(-c(father, mother), ~ is.na(.)), if_all(c(father, mother), ~ !is.na(.)))
Output
father mother children cousins 1 1 1 NA NA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With