Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to subset (filter) a data frame due to NA's

Why in the code below dplyr's filter doesn't return the same data.frame as base R subsetting?

In fact none of them works as expected. I'd like to remove observations/rows which, simultaneously, b==1 AND c==1. That is, I'd like to remove only the third row.

require(dplyr)
df <- data.frame(a=c(0,0,0,0,1,1,1),
  b=c(0,0,1,1,0,0,1),
  c=c(1,NA,1,NA,1,NA,NA))

filter(df, !(b==1 & c==1))

df[!(df$b==1 & df$c==1),]
like image 217
Rodrigo Remedio Avatar asked Jan 06 '23 13:01

Rodrigo Remedio


1 Answers

Or use complete.cases to convert NA to FALSE in the result logic vector so that you can pick the corresponding rows up after the negation, and this uses the fact that NA & F = F:

filter(df, !(b == 1 & c == 1 & complete.cases(df[c('b', 'c')])))

#   a b  c
# 1 0 0  1
# 2 0 0 NA
# 3 0 1 NA
# 4 1 0  1
# 5 1 0 NA
# 6 1 1 NA

More logical operations with NA involved here, which is a little bit confusing at the first glance but they are following the logic:

NA & F
# [1] FALSE
NA | T
# [1] TRUE
NA & T
# [1] NA
NA | F
# [1] NA
like image 60
Psidom Avatar answered Jan 17 '23 16:01

Psidom