How to subset data in R without losing NA rows?
The post above subsets using logical indexing. Is there a way to do it in dplyr?
Also, when does dplyr automatically delete NAs? In my experience, it removes NA when I filter out a specific string, eg:
b = a %>% filter(col != "str")
I would think this would not exclude NA
values but it does. But when I use other format of filtering, it does not automatically exclude NA
, eg:
b = a %>% filter(!grepl("str", col))
I would like to understand this feature of filter. I would appreciate any help. Thank you!
The filter() function is used to subset a data frame, retaining all rows that satisfy your conditions. To be retained, the row must produce a value of TRUE for all conditions. Note that when a condition evaluates to NA the row will be dropped, unlike base subsetting with [ .
By using na. omit() , complete. cases() , rowSums() , and drop_na() methods you can remove rows that contain NA ( missing values) from R data frame.
Use inbuilt data sets or create a new data set and look at top few rows in the data set. Then, look at the bottom few rows in the data set. Check the data structure. Filter the data by categorical column using split function.
The documentation for dplyr::filter
says... "Unlike base subsetting, rows where the condition evaluates to NA are dropped."
NA != "str"
evaluates to NA
so is dropped by filter
.
!grepl("str", NA)
returns TRUE
, so is kept.
If you want filter
to keep NA
, you could do filter(is.na(col)|col!="str")
If you want to keep NAs created by the filter condition you can simply turn the condition NAs into TRUEs using replace_na
from tidyr.
a <- data.frame(col = c("hello", NA, "str"))
a %>% filter((col != "str") %>% replace_na(TRUE))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With