Remove rows where all variables are NA using dplyr

Question

I'm having some issues with a seemingly simple task: to remove all rows where all variables are NA using dplyr. I know it can be done using base R (Remove rows in R matrix where all data is NA and Removing empty rows of a data file in R), but I'm curious to know if there is a simple way of doing it using dplyr.

Example:

library(tidyverse) dat <- tibble(a = c(1, 2, NA), b = c(1, NA, NA), c = c(2, NA, NA)) filter(dat, !is.na(a) | !is.na(b) | !is.na(c))

The filter call above does what I want but it's infeasible in the situation I'm facing (as there is a large number of variables). I guess one could do it by using filter_ and first creating a string with the (long) logical statement, but it seems like there should be a simpler way.

Another way is to use rowwise() and do():

na <- dat %>%    rowwise() %>%    do(tibble(na = !all(is.na(.)))) %>%    .$na filter(dat, na)

but that does not look too nice, although it gets the job done. Other ideas?

MarkusN · Accepted Answer

Since dplyr 0.7.0 new, scoped filtering verbs exists. Using filter_any you can easily filter rows with at least one non-missing column:

# dplyr 0.7.0 dat %>% filter_all(any_vars(!is.na(.)))

Using @hejseb benchmarking algorithm it appears that this solution is as efficient as f4.

UPDATE:

Since dplyr 1.0.0 the above scoped verbs are superseded. Instead the across function family was introduced, which allows to perform a function on multiple (or all) columns. Filtering rows with at least one column being not NA looks now like this:

# dplyr 1.0.0 dat %>% filter(if_any(everything(), ~ !is.na(.)))

Remove rows where all variables are NA using dplyr

Tags:

r

dplyr

tidyverse

hejseb

1 Answers

MarkusN

Recent Activity

Donate For Us

Remove rows where all variables are NA using dplyr

Tags:

r

dplyr

tidyverse

hejseb

1 Answers

MarkusN

Related questions

Recent Activity

Donate For Us