Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select groups which have at least one of a certain value

Tags:

r

dplyr

tidy

How to select groups based on a condition on the individual rows, say keep all groups that contain at least one (ANY) of a certain value, e.g. 4, (or any other condition that is TRUE at least once). Or phrased the other way around: if a group does not have any rows where condition is true, the entire group should be removed.

Let's take a very simple data, with two groups, and I want to select the group that has at least one row with a Value of 4, (i.e. group B here)

library(dplyr) df <- data.frame(Group = LETTERS[c(1,1,1,2,2,2)], Value=c(1:5, 4))  df #   Group Value # 1     A     1 # Group A has no values == 4 ~~> remove entire group  # 2     A     2 # 3     B     3 # 4     B     4 # Group B has at least one 4 ~~> keep the whole group 

Doing group_by() and then filter (as in this post) will only select individual rows that contains a value of 4, not the whole group:

df %>%   group_by(Group) %>%   filter(Value == 4) #    Group Value #   <fctr> <int> # 1      B     4 
like image 826
Matifou Avatar asked Nov 27 '16 01:11

Matifou


1 Answers

This turns out to be pretty easy: you just need to use the any() function in the filter call. Indeed, it appears that:

  • filter(any(...)) evaluates at the group_by() level,

  • filter(...) evaluates at the rowwise() level, even when preceded by group_by().

Hence use:

 df %>%     group_by(Group) %>%     filter(any(Value==4))   Group Value  <fctr> <int> 1      B     3 2      B     4 

Interestingly, the same appear with mutate, compare:

df %>% group_by(Group) %>% mutate(check1=any(Value==4),         check2=Value==4)      Group Value check1 check2   <fctr> <int>  <lgl>  <lgl> 1      A     1  FALSE  FALSE 2      A     2  FALSE  FALSE 3      B     3   TRUE  FALSE 4      B     4   TRUE   TRUE 
like image 157
Matifou Avatar answered Sep 25 '22 13:09

Matifou