Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

merge and replace values in two data.tables

Tags:

r

data.table

I am quite new to the data.table package and have a simple problem. I have two data.tables that are compared with the use of keys. In data.table 1 the value of column C is changed from "NO" to "OK" if the key columns A and B are equally found in data.table B. This step is inevitably and has to be done.

library(data.table)
df_1 <- data.frame(A=c(1,1,3,5,6,7), B = c("x","y","z","q","w","e"), C = rep("NO",6))
df_2 <- data.frame(A=c(3,5,1), B = c("z","q","x"), D=c(3,5,99))
keys <- c("A","B")
dt_1 <- data.table(df_1, key = keys)
dt_2 <- data.table(df_2, key = keys)
dt_1[dt_2, C := "OK"]

Now I get the data.table:

   A     B     C
1: 1     x     OK
2: 1     y     NO
3: 3     z     OK
4: 5     q     OK
5: 6     w     NO
6: 7     e     NO

I would like to include a second operation. If in data.table 2 the value of column A is not equal to column D the value of column D should be used after the first operation. Meaning column D is superior to A. This should work no matter how many values in D are different. The desired data.table looks the following:

   A     B     C
1: 99    x     OK
2: 1     y     NO
3: 3     z     OK
4: 5     q     OK
5: 6     w     NO
6: 7     e     NO

I tired something without success.

dt_1[dt_2, A != D, A := D]

Thank you for your help!

like image 758
VDK Avatar asked Sep 03 '15 08:09

VDK


People also ask

How do I merge two data tables in R?

To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.


1 Answers

Try:

dt_1[C == "OK", A:= dt_2[,D]]

#   A B  C
# 1: 99 x OK
# 2:  1 y NO
# 3:  3 z OK
# 4:  5 q OK
# 5:  6 w NO
# 6:  7 e NO

And here's how you should have done the whole process in the first place.

Create both data sets as data.tables in the first place (or convert in place using setDT)

dt_1 <- data.table(A=c(1,1,3,5,6,7), B = c("x","y","z","q","w","e"), C = rep("NO",6))
dt_2 <- data.table(A=c(3,5,1), B = c("z","q","x"), D=c(3,5,99))

Then key them using setkeyv instead of using the <- operator

keys <- c("A","B")
setkeyv(dt_1, keys)
setkeyv(dt_2, keys)

Then just update both column within a single join

dt_1[dt_2, `:=`(C = "OK", A = i.D)]
#     A B  C
# 1: 99 x OK
# 2:  1 y NO
# 3:  3 z OK
# 4:  5 q OK
# 5:  6 w NO
# 6:  7 e NO

In this case the condition df_1$A != df_2$D is redundant

like image 139
Andriy T. Avatar answered Oct 22 '22 06:10

Andriy T.