Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subset by row and column reciprocity [duplicate]

Tags:

r

duplicates

I am having trouble subsetting a data.frame based on reciprocity of values in row and column.

Here is a example df to illustrate the problem:

rater <- c(21, 23, 26, 24)
ratee <- c(24, 21, 23, 21)
rating.data <- data.frame(rater, ratee)

Output:

   rater ratee
1    21    24
2    23    21
3    26    23
4    24    21

I would like to subset this df by only keeping the rows that have reciprocal values.

The resulting subset should look like this:

   rater ratee
1    21    24
4    24    21

Any thoughts would be much appreciated!

like image 921
SeekingData Avatar asked Sep 05 '17 15:09

SeekingData


4 Answers

We could sort by row and then use duplicated

m1 <- t(apply(rating.data, 1, sort))
rating.data[duplicated(m1)|duplicated(m1, fromLast = TRUE),]
#   rater ratee
#1    21    24
#4    24    21
like image 196
akrun Avatar answered Oct 25 '22 20:10

akrun


Another possibility:

library(dplyr)
rating.data %>% inner_join(.,.,by=c("rater" = "ratee","ratee"="rater"))

Or this, for some reason it's twice faster on your small example (though slower than akrun's solution):

merge(rating.data,setNames(rating.data,rev(names(rating.data))))

to keep the second solution flexible with your additional columns:

merge(rating.data,setNames(rating.data[,c("rater","ratee")],c("ratee","rater")))
like image 26
Moody_Mudskipper Avatar answered Oct 25 '22 21:10

Moody_Mudskipper


library(data.table)
N=10#number of rows 
dt1<-data.table(a=1:N,b=sample(N))#create the data.table that holds the info

dt1[,d:=ifelse(a<b,paste0(a,"_",b),paste0(b,"_",a))]#create unique key per pair  respecting the rule "min_max"
setkey(dt1,d)#setting the key 
dt1[dt1[,.N,d][N!=1],.(a,b)] #keep only the pairs that appear more than once
like image 2
amonk Avatar answered Oct 25 '22 21:10

amonk


You can also use pmin and pmax to assist with grouping and then filter on all groups having more than one entry, i.e.

library(dplyr)

df %>% 
 group_by(grp = paste0(pmin(rater, ratee), pmax(rater, ratee))) %>% 
 filter(n() > 1) %>% 
 ungroup() %>% 
 select(-grp)

which gives,

# A tibble: 2 x 2
  rater ratee
  <dbl> <dbl>
1    21    24
2    24    21
like image 2
Sotos Avatar answered Oct 25 '22 20:10

Sotos