Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find values from one column that occur anywhere in another column of the same table

Tags:

r

dplyr

Okay, let's imagine we have a table (dataframe, tibble) about articles written by authors. Each article has been reviewed by another author, but not all authors are reviewers, e.g.

articles <- tibble(author=c("user1", "user2", "user3"), reviewer=c("user2", "user3", "user2"))
> articles
# A tibble: 3 x 2
  author reviewer
  <chr>  <chr>   
1 user1  user2   
2 user2  user3   
3 user3  user2   

to just look at the columns of interest here. So we can see that user2 and user3 have authored articles and are also reviewers. user1 is not a reviewer. How do we actually return the users who have both authored and reviewed articles? In base R you could do:

authors <- unique(articles$author)
reviewers <- unique(articles$reviewer)
> authors[authors %in% reviewers]
[1] "user2" "user3"

but how can we do this in the tidyverse?

like image 905
doctorG Avatar asked Jan 26 '23 16:01

doctorG


1 Answers

It feels the answer would likely be in dplyr, and I have refined my approach down to:

articles %>% 
  semi_join((.), 
    by=c("author"="reviewer")) %>% 
  select(author) %>%
  distinct
# A tibble: 2 x 1
  author
  <chr> 
1 user2 
2 user3 

This is the correct answer, and I admit I've refined it somewhat even whilst writing this. But has anyone got any alternative takes on the problem? I'm actually quite pleased how it's boiled down, but it took me a little thinking so I'll post it in the hope it helps someone else.

like image 165
doctorG Avatar answered Mar 22 '23 23:03

doctorG