Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter based on NA in dplyr

Tags:

r

dplyr

This is my df

df <- structure(structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L), .Label = c("A", "B", "C", "D", "E"), class = "factor"), y = c(NA, NA, NA, NA, 1, NA, NA, NA, 1, 2, NA, NA, 1, 2, 3, NA, 2, 2, 3, 4, NA, 3, 3, 4, 5), x = c(1L, 2L, 3L, 4L,5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L)), .Names = c("group", "y", "x"), row.names = c(NA, 25L), class = "data.frame"))

> df
   group  y x
1      A NA 1
2      A NA 2
3      A NA 3
4      A NA 4
5      A  1 5
6      B NA 1
7      B NA 2
8      B NA 3
9      B  1 4
10     B  2 5
11     C NA 1
12     C NA 2
13     C  1 3
14     C  2 4
15     C  3 5
16     D NA 1
17     D  2 2
18     D  2 3
19     D  3 4
20     D  4 5
21     E NA 1
22     E  3 2
23     E  3 3
24     E  4 4
25     E  5 5

My goal is to calculate the mean per x value (across groups), using mutate. But first I'd like to filter the data, such that only those values of x remain for which there are at least 3 non-NA values. So in this example I only want to include those entries for which x is at least 3. I can't figure out how to create the filter(), any suggestions?

like image 215
erc Avatar asked Jan 16 '15 16:01

erc


People also ask

How do I remove all rows with NA in Dplyr?

By using na. omit() , complete. cases() , rowSums() , and drop_na() methods you can remove rows that contain NA ( missing values) from R data frame. Let's see an example for each of these methods.

How do I remove cells with NA in R?

To remove all rows having NA, we can use na. omit function. For Example, if we have a data frame called df that contains some NA values then we can remove all rows that contains at least one NA by using the command na. omit(df).

How do I subset a DataFrame based on column value in R?

How to subset the data frame (DataFrame) by column value and name in R? By using R base df[] notation, or subset() you can easily subset the R Data Frame (data. frame) by column value or by column name.


1 Answers

You could try

df %>% 
   group_by(group) %>% #group_by(x) %>% #as per the OP's clarification
   filter(sum(!is.na(y))>=3) %>% 
   mutate(Mean=mean(x, na.rm=TRUE))
like image 158
akrun Avatar answered Oct 03 '22 05:10

akrun