Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter rows with dplyr/magrittr based on entire row

Tags:

r

dplyr

One is able to filter rows with dplyr with filter, but the condition is usually based on specific columns per row such as

d <- data.frame(x=c(1,2,NA),y=c(3,NA,NA),z=c(NA,4,5))
d %>% filter(!is.na(y))

I want to filter the row by whether the number of NA is greater than 50%, such as

d %>% filter(mean(is.na(EACHROW)) < 0.5 )

How do I do this in a dplyr/magrittr flow fashion?

like image 857
Make42 Avatar asked Jan 07 '16 09:01

Make42


1 Answers

You could use rowSums or rowMeans for that. An example with the provided data:

> d
   x  y  z
1  1  3 NA
2  2 NA  4
3 NA NA  5

# with rowSums:
d %>% filter(rowSums(is.na(.))/ncol(.) < 0.5)

# with rowMeans:
d %>% filter(rowMeans(is.na(.)) < 0.5)

which both give:

  x  y  z
1 1  3 NA
2 2 NA  4

As you can see row 3 is removed from the data.


In base R, you could just do:

d[rowMeans(is.na(d)) < 0.5,]

to get the same result.

like image 59
Jaap Avatar answered Nov 03 '22 22:11

Jaap