This is my df
df <- structure(structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L), .Label = c("A", "B", "C", "D", "E"), class = "factor"), y = c(NA, NA, NA, NA, 1, NA, NA, NA, 1, 2, NA, NA, 1, 2, 3, NA, 2, 2, 3, 4, NA, 3, 3, 4, 5), x = c(1L, 2L, 3L, 4L,5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L)), .Names = c("group", "y", "x"), row.names = c(NA, 25L), class = "data.frame"))
> df
group y x
1 A NA 1
2 A NA 2
3 A NA 3
4 A NA 4
5 A 1 5
6 B NA 1
7 B NA 2
8 B NA 3
9 B 1 4
10 B 2 5
11 C NA 1
12 C NA 2
13 C 1 3
14 C 2 4
15 C 3 5
16 D NA 1
17 D 2 2
18 D 2 3
19 D 3 4
20 D 4 5
21 E NA 1
22 E 3 2
23 E 3 3
24 E 4 4
25 E 5 5
My goal is to calculate the mean per x value (across groups), using mutate
. But first I'd like to filter the data, such that only those values of x remain for which there are at least 3 non-NA values. So in this example I only want to include those entries for which x is at least 3. I can't figure out how to create the filter()
, any suggestions?
By using na. omit() , complete. cases() , rowSums() , and drop_na() methods you can remove rows that contain NA ( missing values) from R data frame. Let's see an example for each of these methods.
To remove all rows having NA, we can use na. omit function. For Example, if we have a data frame called df that contains some NA values then we can remove all rows that contains at least one NA by using the command na. omit(df).
How to subset the data frame (DataFrame) by column value and name in R? By using R base df[] notation, or subset() you can easily subset the R Data Frame (data. frame) by column value or by column name.
You could try
df %>%
group_by(group) %>% #group_by(x) %>% #as per the OP's clarification
filter(sum(!is.na(y))>=3) %>%
mutate(Mean=mean(x, na.rm=TRUE))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With