When I use filter from the dplyr package to drop a level of a factor variable, filter also drops the NA values.  Here's an example:
library(dplyr)
set.seed(919)
(dat <- data.frame(var1 = factor(sample(c(1:3, NA), size = 10, replace = T))))
#    var1
# 1  <NA>
# 2     3
# 3     3
# 4     1
# 5     1
# 6  <NA>
# 7     2
# 8     2
# 9  <NA>
# 10    1
filter(dat, var1 != 1)
#   var1
# 1    3
# 2    3
# 3    2
# 4    2
This does not seem ideal -- I only wanted to drop rows where var1 == 1.
It looks like this is occurring because any comparison with NA returns NA, which filter then drops.  So, for example, filter(dat, !(var1 %in% 1)) produces the correct results.  But is there a way to tell filter not to drop the NA values?
You could use this:
 filter(dat, var1 != 1 | is.na(var1))
  var1
1 <NA>
2    3
3    3
4 <NA>
5    2
6    2
7 <NA>
And it won't.
Also just for completion, dropping NAs is the intended behavior of filter as you can see from the following:
test_that("filter discards NA", {
  temp <- data.frame(
    i = 1:5,
    x = c(NA, 1L, 1L, 0L, 0L)
  )
  res <- filter(temp, x == 1)
  expect_equal(nrow(res), 2L)
})
This test above was taken from the tests for filter from github.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With