Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to filter data without losing NA rows using dplyr

Tags:

r

filter

dplyr

How to subset data in R without losing NA rows?

The post above subsets using logical indexing. Is there a way to do it in dplyr?

Also, when does dplyr automatically delete NAs? In my experience, it removes NA when I filter out a specific string, eg:

b = a %>% filter(col != "str")

I would think this would not exclude NA values but it does. But when I use other format of filtering, it does not automatically exclude NA, eg:

b = a %>% filter(!grepl("str", col))

I would like to understand this feature of filter. I would appreciate any help. Thank you!

like image 314
Brent Carbonera Avatar asked Sep 23 '17 10:09

Brent Carbonera


People also ask

What does dplyr filter do?

The filter() function is used to subset a data frame, retaining all rows that satisfy your conditions. To be retained, the row must produce a value of TRUE for all conditions. Note that when a condition evaluates to NA the row will be dropped, unlike base subsetting with [ .

How do I remove Na values from a row in R?

By using na. omit() , complete. cases() , rowSums() , and drop_na() methods you can remove rows that contain NA ( missing values) from R data frame.

How do I filter categorical data in R?

Use inbuilt data sets or create a new data set and look at top few rows in the data set. Then, look at the bottom few rows in the data set. Check the data structure. Filter the data by categorical column using split function.


2 Answers

The documentation for dplyr::filter says... "Unlike base subsetting, rows where the condition evaluates to NA are dropped."

NA != "str" evaluates to NA so is dropped by filter.

!grepl("str", NA) returns TRUE, so is kept.

If you want filter to keep NA, you could do filter(is.na(col)|col!="str")

like image 66
Andrew Gustar Avatar answered Oct 06 '22 16:10

Andrew Gustar


If you want to keep NAs created by the filter condition you can simply turn the condition NAs into TRUEs using replace_na from tidyr.

a <- data.frame(col = c("hello", NA, "str"))
a %>% filter((col != "str") %>% replace_na(TRUE))
like image 22
qwr Avatar answered Oct 06 '22 17:10

qwr