I am quite new to the data.table package and have a simple problem. I have two data.tables that are compared with the use of keys. In data.table 1 the value of column C is changed from "NO" to "OK" if the key columns A and B are equally found in data.table B. This step is inevitably and has to be done.
library(data.table)
df_1 <- data.frame(A=c(1,1,3,5,6,7), B = c("x","y","z","q","w","e"), C = rep("NO",6))
df_2 <- data.frame(A=c(3,5,1), B = c("z","q","x"), D=c(3,5,99))
keys <- c("A","B")
dt_1 <- data.table(df_1, key = keys)
dt_2 <- data.table(df_2, key = keys)
dt_1[dt_2, C := "OK"]
Now I get the data.table:
A B C
1: 1 x OK
2: 1 y NO
3: 3 z OK
4: 5 q OK
5: 6 w NO
6: 7 e NO
I would like to include a second operation. If in data.table 2 the value of column A is not equal to column D the value of column D should be used after the first operation. Meaning column D is superior to A. This should work no matter how many values in D are different. The desired data.table looks the following:
A B C
1: 99 x OK
2: 1 y NO
3: 3 z OK
4: 5 q OK
5: 6 w NO
6: 7 e NO
I tired something without success.
dt_1[dt_2, A != D, A := D]
Thank you for your help!
To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.
Try:
dt_1[C == "OK", A:= dt_2[,D]]
# A B C
# 1: 99 x OK
# 2: 1 y NO
# 3: 3 z OK
# 4: 5 q OK
# 5: 6 w NO
# 6: 7 e NO
And here's how you should have done the whole process in the first place.
Create both data sets as data.table
s in the first place (or convert in place using setDT
)
dt_1 <- data.table(A=c(1,1,3,5,6,7), B = c("x","y","z","q","w","e"), C = rep("NO",6))
dt_2 <- data.table(A=c(3,5,1), B = c("z","q","x"), D=c(3,5,99))
Then key them using setkeyv
instead of using the <-
operator
keys <- c("A","B")
setkeyv(dt_1, keys)
setkeyv(dt_2, keys)
Then just update both column within a single join
dt_1[dt_2, `:=`(C = "OK", A = i.D)]
# A B C
# 1: 99 x OK
# 2: 1 y NO
# 3: 3 z OK
# 4: 5 q OK
# 5: 6 w NO
# 6: 7 e NO
In this case the condition df_1$A != df_2$D
is redundant
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With