I am having trouble subsetting a data.frame based on reciprocity of values in row and column.
Here is a example df to illustrate the problem:
rater <- c(21, 23, 26, 24)
ratee <- c(24, 21, 23, 21)
rating.data <- data.frame(rater, ratee)
Output:
rater ratee
1 21 24
2 23 21
3 26 23
4 24 21
I would like to subset this df by only keeping the rows that have reciprocal values.
The resulting subset should look like this:
rater ratee
1 21 24
4 24 21
Any thoughts would be much appreciated!
We could sort by row and then use duplicated
m1 <- t(apply(rating.data, 1, sort))
rating.data[duplicated(m1)|duplicated(m1, fromLast = TRUE),]
# rater ratee
#1 21 24
#4 24 21
Another possibility:
library(dplyr)
rating.data %>% inner_join(.,.,by=c("rater" = "ratee","ratee"="rater"))
Or this, for some reason it's twice faster on your small example (though slower than akrun's solution):
merge(rating.data,setNames(rating.data,rev(names(rating.data))))
to keep the second solution flexible with your additional columns:
merge(rating.data,setNames(rating.data[,c("rater","ratee")],c("ratee","rater")))
library(data.table)
N=10#number of rows
dt1<-data.table(a=1:N,b=sample(N))#create the data.table that holds the info
dt1[,d:=ifelse(a<b,paste0(a,"_",b),paste0(b,"_",a))]#create unique key per pair respecting the rule "min_max"
setkey(dt1,d)#setting the key
dt1[dt1[,.N,d][N!=1],.(a,b)] #keep only the pairs that appear more than once
You can also use pmin
and pmax
to assist with grouping and then filter on all groups having more than one entry, i.e.
library(dplyr)
df %>%
group_by(grp = paste0(pmin(rater, ratee), pmax(rater, ratee))) %>%
filter(n() > 1) %>%
ungroup() %>%
select(-grp)
which gives,
# A tibble: 2 x 2 rater ratee <dbl> <dbl> 1 21 24 2 24 21
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With