Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Data Table - join but filter with update

Tags:

r

data.table

I'm trying to figure out how to join 2 data tables and update the first but with a filter applied.

DT<-data.table(a=rep(1:3,3),b=seq(1:9))
DT
   a b
1: 1 1
2: 2 2
3: 3 3
4: 1 4
5: 2 5
6: 3 6
7: 1 7
8: 2 8
9: 3 9

DT2 <- data.table(b=seq(1:9), c=rep(10,9))
> DT2
b  c
1: 1 10
2: 2 10
3: 3 10
4: 4 10
5: 5 10
6: 6 10
7: 7 10
8: 8 10
9: 9 10

I can do a basic equijoin like so

DT[DT2, on=c(b="b")]

But what I'd like to do logically is this

DT[a==3,DT2, on=c(b="b")]

but I get the following error

Error in `[.data.table`(DT, a == 3, DT2, on = c(b = "b")) : 
  logical error. i is not a data.table, but 'on' argument is provided.

I can reverse the order of the join and apply the filter...

DT2[DT[a==3,], on=c(b="b")]

   b a
1: 3 3
2: 6 3
3: 9 3

Which gives the correct rows but the column order is incorrect. That aside I'd like to update DT with c but only for the rows I've filtered in DT and that satisfy the join.

If this was SQL I would use an update with a subquery like so:

UPDATE
    DT
set
    c = (select c from DT2 where DT2.b = DT.B)
WHERE
    DT.a=3

I seem to be going in circles with the Data table syntax - can anyone point me in the right direction?

Cheers

David

like image 722
Bravid Avatar asked Jan 29 '23 11:01

Bravid


2 Answers

Another option without having to make a dummy variable is:

DT[a==3, c := DT2[DT[a==3], c, on = c(b="b")]]
DT
#   a b  c
#1: 1 1 NA
#2: 2 2 NA
#3: 3 3 10
#4: 1 4 NA
#5: 2 5 NA
#6: 3 6 10
#7: 1 7 NA
#8: 2 8 NA
#9: 3 9 10
like image 59
Mike H. Avatar answered Feb 01 '23 09:02

Mike H.


You can create a dummy variable a in DT2, join on both columns a and b and then Update:

DT[DT2[, c(a = 3, .SD)], c := i.c, on = c("a", "b")]

DT
#   a b  c
#1: 1 1 NA
#2: 2 2 NA
#3: 3 3 10
#4: 1 4 NA
#5: 2 5 NA
#6: 3 6 10
#7: 1 7 NA
#8: 2 8 NA
#9: 3 9 10
like image 32
Psidom Avatar answered Feb 01 '23 09:02

Psidom