dplyr identify rows that do not have a matched column within group

Question

I have a df that looks as follow:

GROUP    NAME     TITLE     LEVEL
  A      John      Lead       0
  A      John      Staff      1
  A      Jake      Jr         0
  B      Bob       Lead       1
  B      Bob       Lead       0
  C      Andrew    Jr         0
  C      Andrew    Jr         1
  C      Rebecca   Staff      0

What I am trying to do is identify names within each group that have a title in level 1 that they do not have for level 0. In this example I would be looking for:

GROUP    NAME     TITLE     LEVEL
  A      John      Staff      1

Because in group A John has a title of staff for level 1 but not for level 0.

Jon Spring · Accepted Answer

The simplest approach I can think of would be an anti-join, where we look for GROUP/NAME/TITLE in Level 1 that is not in Level 0.

anti_join(df |> filter(LEVEL == 1),
          df |> filter(LEVEL == 0),
          join_by(GROUP, NAME, TITLE))

Result

# A tibble: 1 × 4
  GROUP NAME  TITLE LEVEL
  <chr> <chr> <chr> <dbl>
1 A     John  Staff     1

Maël · Answer

Within each name, there should be:

At least one observation for each levels of LEVEL (0 and 1).
At least one (length(...) != 0) TITLE among the LEVEL == 1 that is different (setdiff) from any of the TITLEs with LEVEL == 0.

df |> 
  filter(any(LEVEL == 0) & LEVEL == 1 &
         length(setdiff(TITLE[LEVEL == 1], TITLE[LEVEL == 0])) != 0,
         .by = NAME)

#   GROUP NAME TITLE LEVEL
# 1     A John Staff     1

dplyr identify rows that do not have a matched column within group

Tags:

r

dplyr

user3249770

2 Answers

Jon Spring

Maël

Recent Activity

Donate For Us

dplyr identify rows that do not have a matched column within group

Tags:

r

dplyr

user3249770

2 Answers

Jon Spring

Maël

Related questions

Recent Activity

Donate For Us