Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Not Join" in R

I am looking for a quick way to do 'not join' (i.e. keep rows that didn't merge, or inverse of inner join). The way I've been doing is to use data.table for X and Y, then set key. For example:

require(data.table)

X <- data.table(category = c('A','B','C','D'), val1 = c(0.2,0.3,0.8,0.7))
Y <- data.table(category = c('B','C','D','E'), val2 = c(2,3,5,7))
XY <- merge(X,Y,by='category')

> XY
   category val1 val2
1:        B  0.3    2
2:        C  0.8    3
3:        D  0.7    5

But I need the inverse of this, so I have to do:

XY_All <- merge(X,Y,by='category',all=TRUE)
setkey(XY,category)
setkey(XY_All,category)
notXY <- XY_All[!XY]    #data.table not join (finally)

> notXY
   category val1 val2
1:        A  0.2   NA
2:        E   NA    7

I feel like this is quite long winded (especially from data.frame). Am I missing something?

EDIT: I got this after thinking more about not joins

X <- data.table(category = c('A','B','C','D'), val1 = c(0.2,0.3,0.8,0.7),key = "category")
Y <- data.table(category = c('B','C','D','E'), val2 = c(2,3,5,7), key = "category")
notXY <- merge(X[!Y],Y[!X],all=TRUE)

But WheresTheAnyKey's answer below is clearer. One last hurdle is the presetting data.table keys, it'd be nice not to have to do that.

EDIT: To clarify, the accepted solution is:

merge(anti_join(X, Y, by = 'category'),anti_join(Y, X, by = 'category'), by = 'category', all = TRUE)
like image 808
tanvach Avatar asked Jun 12 '14 17:06

tanvach


3 Answers

require(dplyr)
rbind_list(anti_join(X, Y), anti_join(Y, X))

EDIT: Since someone asked for some explanation, here's what is happening:

The first anti_join() function returns rows from X that have no matching row in Y with the match determined by what the join is joining by. The second does the reverse. rbind_list() just takes the results of its inputs and makes them into a single tbl with all the observations from each of its inputs, replacing missing variable data with NA.

like image 196
stanekam Avatar answered Sep 19 '22 00:09

stanekam


setkey(X,category)
setkey(Y,category)

rbind(X[!Y], Y[!X], fill = TRUE)
like image 38
Mike.Gahan Avatar answered Sep 17 '22 00:09

Mike.Gahan


You can make it more concise like this:

X <- data.table(category = c('A','B','C','D'), val1 = c(0.2,0.3,0.8,0.7),key = "category")
Y <- data.table(category = c('B','C','D','E'), val2 = c(2,3,5,7), key = "category")
notXY <- merge(X,Y,all = TRUE)[!merge(X,Y)]
like image 20
WheresTheAnyKey Avatar answered Sep 19 '22 00:09

WheresTheAnyKey