Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R dplyr: filter common values by group

Tags:

r

dplyr

I need to find common values between different groups ideally using dplyr and R.

From my dataset here:

  group   val
  <fct> <dbl>
1 a         1
2 a         2
3 a         3
4 b         3
5 b         4
6 b         5
7 c         1
8 c         3

the expected output is

group   val
<fct> <dbl>
1 a         3
2 b         3
3 c         3

as only number 3 occurs in all groups.

This code seems not working:

# Filter the data 

dd %>% 
  group_by(group) %>% 
  filter(all(val))           # does not work

Example here solves similar issue but have a defined vector of shared values. What if I do not know which ones are shared?

Dummy example:

# Reproducible example: filter all id by group
group = c("a", "a", "a",
          "b", "b", "b",
          "c", "c")
val = c(1,2,3,
        3,4,5,
        1,3)

dd <- data.frame(group,
                 val)
like image 891
maycca Avatar asked Oct 25 '25 05:10

maycca


1 Answers

group_by isolates each group, so we can't very well group_by(group) and compare between between groups. Instead, we can group_by(val) and see which ones have all the groups:

dd %>%
  group_by(val) %>%
  filter(n_distinct(group) == n_distinct(dd$group))
# # A tibble: 3 x 2
# # Groups:   val [1]
#   group   val
#   <chr> <dbl>
# 1 a         3
# 2 b         3
# 3 c         3

This is one of the rare cases where we want to use data$column in a dplyr verb - n_distinct(dd$group) refers explicitly to the ungrouped original data to get the total number of groups. (It could also be pre-computed.) Whereas n_distinct(group) is using the grouped data piped in to filter, thus it gives the number of distinct groups for each value (because we group_by(val)).

like image 160
Gregor Thomas Avatar answered Oct 27 '25 00:10

Gregor Thomas



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!