Unable to subset (filter) a data frame due to NA's

Question

Why in the code below dplyr's filter doesn't return the same data.frame as base R subsetting?

In fact none of them works as expected. I'd like to remove observations/rows which, simultaneously, b==1 AND c==1. That is, I'd like to remove only the third row.

require(dplyr)
df <- data.frame(a=c(0,0,0,0,1,1,1),
  b=c(0,0,1,1,0,0,1),
  c=c(1,NA,1,NA,1,NA,NA))

filter(df, !(b==1 & c==1))

df[!(df$b==1 & df$c==1),]

Psidom · Accepted Answer

Or use complete.cases to convert NA to FALSE in the result logic vector so that you can pick the corresponding rows up after the negation, and this uses the fact that NA & F = F:

filter(df, !(b == 1 & c == 1 & complete.cases(df[c('b', 'c')])))

#   a b  c
# 1 0 0  1
# 2 0 0 NA
# 3 0 1 NA
# 4 1 0  1
# 5 1 0 NA
# 6 1 1 NA

More logical operations with NA involved here, which is a little bit confusing at the first glance but they are following the logic:

NA & F
# [1] FALSE
NA | T
# [1] TRUE
NA & T
# [1] NA
NA | F
# [1] NA

Unable to subset (filter) a data frame due to NA's

Tags:

r

data.table

dplyr

subset

Rodrigo Remedio

1 Answers

Psidom

Recent Activity

Donate For Us

Unable to subset (filter) a data frame due to NA's

Tags:

r

data.table

dplyr

subset

Rodrigo Remedio

1 Answers

Psidom

Related questions

Recent Activity

Donate For Us