How to select groups based on a condition on the individual rows, say keep all groups that contain at least one (ANY) of a certain value, e.g. 4, (or any other condition that is TRUE
at least once). Or phrased the other way around: if a group does not have any rows where condition is true, the entire group should be removed.
Let's take a very simple data, with two groups, and I want to select the group that has at least one row with a Value
of 4, (i.e. group B here)
library(dplyr) df <- data.frame(Group = LETTERS[c(1,1,1,2,2,2)], Value=c(1:5, 4)) df # Group Value # 1 A 1 # Group A has no values == 4 ~~> remove entire group # 2 A 2 # 3 B 3 # 4 B 4 # Group B has at least one 4 ~~> keep the whole group
Doing group_by()
and then filter
(as in this post) will only select individual rows that contains a value of 4, not the whole group:
df %>% group_by(Group) %>% filter(Value == 4) # Group Value # <fctr> <int> # 1 B 4
This turns out to be pretty easy: you just need to use the any()
function in the filter
call. Indeed, it appears that:
filter(any(...))
evaluates at the group_by()
level,
filter(...)
evaluates at the rowwise()
level, even when preceded by group_by()
.
Hence use:
df %>% group_by(Group) %>% filter(any(Value==4)) Group Value <fctr> <int> 1 B 3 2 B 4
Interestingly, the same appear with mutate, compare:
df %>% group_by(Group) %>% mutate(check1=any(Value==4), check2=Value==4) Group Value check1 check2 <fctr> <int> <lgl> <lgl> 1 A 1 FALSE FALSE 2 A 2 FALSE FALSE 3 B 3 TRUE FALSE 4 B 4 TRUE TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With