Okay, let's imagine we have a table (dataframe, tibble) about articles written by authors. Each article has been reviewed by another author, but not all authors are reviewers, e.g.
articles <- tibble(author=c("user1", "user2", "user3"), reviewer=c("user2", "user3", "user2"))
> articles
# A tibble: 3 x 2
author reviewer
<chr> <chr>
1 user1 user2
2 user2 user3
3 user3 user2
to just look at the columns of interest here. So we can see that user2
and user3
have authored articles and are also reviewers. user1
is not a reviewer. How do we actually return the users who have both authored and reviewed articles? In base R you could do:
authors <- unique(articles$author)
reviewers <- unique(articles$reviewer)
> authors[authors %in% reviewers]
[1] "user2" "user3"
but how can we do this in the tidyverse?
It feels the answer would likely be in dplyr
, and I have refined my approach down to:
articles %>%
semi_join((.),
by=c("author"="reviewer")) %>%
select(author) %>%
distinct
# A tibble: 2 x 1
author
<chr>
1 user2
2 user3
This is the correct answer, and I admit I've refined it somewhat even whilst writing this. But has anyone got any alternative takes on the problem? I'm actually quite pleased how it's boiled down, but it took me a little thinking so I'll post it in the hope it helps someone else.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With