Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When filtering with dplyr in R, why do filtered out levels of a variable remain in filtered data? [duplicate]

Tags:

r

filter

dplyr

I'm trying to filter out a bunch of data using the filter command from the dplyr package. Everything appears to be going exactly as I would hope, but when I try to draw some charts off of the new filtered data, all of the levels that I filtered out are showing up (albeit with no values). But the fact that they are there is still throwing off my horizontal axis.

So two questions:

1) Why are these filtered levels still in the data?

2) How do I filter to make these no longer present?

Here is a small example you can run to see what I am talking about:

library(dplyr)
library(ggvis)

# small example frame
data <- data.frame(
  x = c(1:10),
  y = rep(c("yes", "no"), 5)
)

# filtering to only include data with "yes" in y variable
new_data <- data %>%
  filter(y == "yes")

levels(new_data) ## Why is "no" showing up as a level for this if I've filtered that out?

# Illustration of the filtered values still showing up on axis
new_data %>%
  ggvis(~y, ~x) %>%
  layer_bars()

Thanks for any help.

like image 968
Nathan F Avatar asked Dec 19 '22 01:12

Nathan F


1 Answers

Factors in R do not automatically drop levels when filtered. You may think this is a silly default (I do), but it's easy to deal with -- just use the droplevels function on the result.

new_data <- data %>%
  filter(y == "yes") %>%
  droplevels
levels(new_data$y)
## [1] "yes"

If you did this all the time you could define a new function

dfilter <- function(...) droplevels(filter(...))
like image 189
Ben Bolker Avatar answered Mar 15 '23 22:03

Ben Bolker