How to find values from one column that occur anywhere in another column of the same table

Question

Okay, let's imagine we have a table (dataframe, tibble) about articles written by authors. Each article has been reviewed by another author, but not all authors are reviewers, e.g.

articles <- tibble(author=c("user1", "user2", "user3"), reviewer=c("user2", "user3", "user2"))
> articles
# A tibble: 3 x 2
  author reviewer
  <chr>  <chr>   
1 user1  user2   
2 user2  user3   
3 user3  user2

to just look at the columns of interest here. So we can see that user2 and user3 have authored articles and are also reviewers. user1 is not a reviewer. How do we actually return the users who have both authored and reviewed articles? In base R you could do:

authors <- unique(articles$author)
reviewers <- unique(articles$reviewer)
> authors[authors %in% reviewers]
[1] "user2" "user3"

but how can we do this in the tidyverse?

doctorG · Accepted Answer

It feels the answer would likely be in dplyr, and I have refined my approach down to:

articles %>% 
  semi_join((.), 
    by=c("author"="reviewer")) %>% 
  select(author) %>%
  distinct

# A tibble: 2 x 1
  author
  <chr> 
1 user2 
2 user3

This is the correct answer, and I admit I've refined it somewhat even whilst writing this. But has anyone got any alternative takes on the problem? I'm actually quite pleased how it's boiled down, but it took me a little thinking so I'll post it in the hope it helps someone else.

How to find values from one column that occur anywhere in another column of the same table

Tags:

r

dplyr

doctorG

1 Answers

doctorG

Recent Activity

Donate For Us

How to find values from one column that occur anywhere in another column of the same table

Tags:

r

dplyr

doctorG

1 Answers

doctorG

Related questions

Recent Activity

Donate For Us