Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I perform a "setdiff" merge using data.table?

Tags:

r

data.table

Say I have two tables:

library(data.table)
set.seed(1)

tab1 <- data.table(
  let = rep(letters[1:2], each = 3),
  num = rep(1:3, 2),
  val = rnorm(6),
  key = c("let", "num")
)

tab2 <- data.table(
  let = rep(letters[1:2], each = 2),
  num = rep(1:2, 2),
  val = rnorm(4),
  key = c("let", "num")
)

Table 1:

> tab1
   let num        val
1:   a   1 -0.6264538
2:   a   2  0.1836433
3:   a   3 -0.8356286
4:   b   1  1.5952808
5:   b   2  0.3295078
6:   b   3 -0.8204684

Table 2:

> tab2
   let num
1:   a   1
2:   a   2
3:   b   1
4:   b   2

Is there a way to "merge" these tables such that I get all the results in tab1 that are not in tab2?:

   let num        val
1:   a   3 -0.8356286
2:   b   3 -0.8204684
like image 879
sebastian-c Avatar asked Nov 12 '15 08:11

sebastian-c


People also ask

How do I merge two data tables in R?

To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.


1 Answers

In this case, it's equivalent to an anti join:

tab1[!tab2, on=c("let", "num")]

But setdiff() would only the first row for every let,num. This is marked for v1.9.8, FR #547.

like image 85
Arun Avatar answered Sep 28 '22 08:09

Arun