Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr identify rows that do not have a matched column within group

Tags:

r

dplyr

I have a df that looks as follow:

GROUP    NAME     TITLE     LEVEL
  A      John      Lead       0
  A      John      Staff      1
  A      Jake      Jr         0
  B      Bob       Lead       1
  B      Bob       Lead       0
  C      Andrew    Jr         0
  C      Andrew    Jr         1
  C      Rebecca   Staff      0

What I am trying to do is identify names within each group that have a title in level 1 that they do not have for level 0. In this example I would be looking for:

GROUP    NAME     TITLE     LEVEL
  A      John      Staff      1

Because in group A John has a title of staff for level 1 but not for level 0.

like image 427
user3249770 Avatar asked Sep 02 '25 03:09

user3249770


2 Answers

The simplest approach I can think of would be an anti-join, where we look for GROUP/NAME/TITLE in Level 1 that is not in Level 0.

anti_join(df |> filter(LEVEL == 1),
          df |> filter(LEVEL == 0),
          join_by(GROUP, NAME, TITLE))

Result

# A tibble: 1 × 4
  GROUP NAME  TITLE LEVEL
  <chr> <chr> <chr> <dbl>
1 A     John  Staff     1
like image 101
Jon Spring Avatar answered Sep 05 '25 01:09

Jon Spring


Within each name, there should be:

  1. At least one observation for each levels of LEVEL (0 and 1).
  2. At least one (length(...) != 0) TITLE among the LEVEL == 1 that is different (setdiff) from any of the TITLEs with LEVEL == 0.
df |> 
  filter(any(LEVEL == 0) & LEVEL == 1 &
         length(setdiff(TITLE[LEVEL == 1], TITLE[LEVEL == 0])) != 0,
         .by = NAME)

#   GROUP NAME TITLE LEVEL
# 1     A John Staff     1
like image 24
Maël Avatar answered Sep 04 '25 23:09

Maël