How to subset data in R without losing NA rows? The post above subsets using logical indexing. Is there a way to do it in dplyr? Also, when does dplyr automatically delete NAs? In my experience, it removes NA when I filter out a specific string, eg: <pre class="prettyprint"><code>b = a %>% filter(col != "str") </code></pre> I would think this would not exclude <code>NA</code> values but it does. But when I use other format of filtering, it does not automatically exclude <code>NA</code>, eg: <pre class="prettyprint"><code>b = a %>% filter(!grepl("str", col)) </code></pre> I would like to understand this feature of filter. I would appreciate any help. Thank you!

The documentation for <code>dplyr::filter</code> says... "Unlike base subsetting, rows where the condition evaluates to NA are dropped." <code>NA != "str"</code> evaluates to <code>NA</code> so is dropped by <code>filter</code>. <code>!grepl("str", NA)</code> returns <code>TRUE</code>, so is kept. If you want <code>filter</code> to keep <code>NA</code>, you could do <code>filter(is.na(col)|col!="str")</code>

If you want to keep NAs created by the filter condition you can simply turn the condition NAs into TRUEs using <code>replace_na</code> from tidyr. <pre class="prettyprint"><code>a <- data.frame(col = c("hello", NA, "str")) a %>% filter((col != "str") %>% replace_na(TRUE)) </code></pre>

How to filter data without losing NA rows using dplyr

Tags:

r

filter

dplyr

How to subset data in R without losing NA rows?

The post above subsets using logical indexing. Is there a way to do it in dplyr?

Also, when does dplyr automatically delete NAs? In my experience, it removes NA when I filter out a specific string, eg:

b = a %>% filter(col != "str")

I would think this would not exclude NA values but it does. But when I use other format of filtering, it does not automatically exclude NA, eg:

b = a %>% filter(!grepl("str", col))

I would like to understand this feature of filter. I would appreciate any help. Thank you!

314

asked Sep 23 '17 10:09

Brent Carbonera

2 Answers

The documentation for dplyr::filter says... "Unlike base subsetting, rows where the condition evaluates to NA are dropped."

NA != "str" evaluates to NA so is dropped by filter.

!grepl("str", NA) returns TRUE, so is kept.

If you want filter to keep NA, you could do filter(is.na(col)|col!="str")

answered Oct 06 '22 16:10

Andrew Gustar

If you want to keep NAs created by the filter condition you can simply turn the condition NAs into TRUEs using replace_na from tidyr.

a <- data.frame(col = c("hello", NA, "str"))
a %>% filter((col != "str") %>% replace_na(TRUE))

answered Oct 06 '22 17:10

qwr

Related questions
                            
                                Collapse rows with overlapping ranges
                            
                                Remove row if any column contains a specific string
                            
                                draw border around legend continuous gradient color bar of heatmap
                            
                                How to extract the last digits of strings using regular expressions?
                            
                                ggplot2: how to transparently shade alternate days on a plot
                            
                                running r scripts or commands with interpretor in unix for unix-layman
                            
                                Save object using variable with object name [duplicate]
                            
                                CRAN Package Depends on Bioconductor Package Installing error
                            
                                How can I control y-axis ticks and x-axis ticks independently in ggplot2?
                            
                                R - merge a list of data frames into one data frame with missing values by row
                            
                                Generate covariance matrix from correlation matrix
                            
                                Combining date and time into a Date column for plotting
                            
                                `fill` scale is not shown in the legend
                            
                                R - How to convert latitude and longitude coordinates into an address/ human readable location?
                            
                                How do I get just the first quartile from a column
                            
                                styleColorBar Center and shift Left/Right dependent on Sign
                            
                                Remove 'Show Entries' in datatable
                            
                                line break and subscript in axis title using plotly in R
                            
                                Collect All user inputs throughout the Shiny App
                            
                                fread - read all columns as character

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With