I am quite new to the data.table package and have a simple problem. I have two data.tables that are compared with the use of keys. In data.table 1 the value of column C is changed from "NO" to "OK" if the key columns A and B are equally found in data.table B. This step is inevitably and has to be done. <pre class="prettyprint"><code>library(data.table) df_1 <- data.frame(A=c(1,1,3,5,6,7), B = c("x","y","z","q","w","e"), C = rep("NO",6)) df_2 <- data.frame(A=c(3,5,1), B = c("z","q","x"), D=c(3,5,99)) keys <- c("A","B") dt_1 <- data.table(df_1, key = keys) dt_2 <- data.table(df_2, key = keys) dt_1[dt_2, C := "OK"] </code></pre> Now I get the data.table: <pre class="prettyprint"><code> A B C 1: 1 x OK 2: 1 y NO 3: 3 z OK 4: 5 q OK 5: 6 w NO 6: 7 e NO </code></pre> I would like to include a second operation. If in data.table 2 the value of column A is not equal to column D the value of column D should be used after the first operation. Meaning column D is superior to A. This should work no matter how many values in D are different. The desired data.table looks the following: <pre class="prettyprint"><code> A B C 1: 99 x OK 2: 1 y NO 3: 3 z OK 4: 5 q OK 5: 6 w NO 6: 7 e NO </code></pre> I tired something without success. <pre class="prettyprint"><code>dt_1[dt_2, A != D, A := D] </code></pre> Thank you for your help!

Try: <pre class="prettyprint"><code>dt_1[C == "OK", A:= dt_2[,D]] # A B C # 1: 99 x OK # 2: 1 y NO # 3: 3 z OK # 4: 5 q OK # 5: 6 w NO # 6: 7 e NO </code></pre> <hr> And here's how you should have done the whole process in the first place. Create both data sets as <code>data.table</code>s in the first place (or convert in place using <code>setDT</code>) <pre class="prettyprint"><code>dt_1 <- data.table(A=c(1,1,3,5,6,7), B = c("x","y","z","q","w","e"), C = rep("NO",6)) dt_2 <- data.table(A=c(3,5,1), B = c("z","q","x"), D=c(3,5,99)) </code></pre> Then key them using <code>setkeyv</code> instead of using the <code><-</code> operator <pre class="prettyprint"><code>keys <- c("A","B") setkeyv(dt_1, keys) setkeyv(dt_2, keys) </code></pre> Then just update both column within a single join <pre class="prettyprint"><code>dt_1[dt_2, `:=`(C = "OK", A = i.D)] # A B C # 1: 99 x OK # 2: 1 y NO # 3: 3 z OK # 4: 5 q OK # 5: 6 w NO # 6: 7 e NO </code></pre> In this case the condition <code>df_1$A != df_2$D</code> is redundant

merge and replace values in two data.tables

Tags:

r

data.table

I am quite new to the data.table package and have a simple problem. I have two data.tables that are compared with the use of keys. In data.table 1 the value of column C is changed from "NO" to "OK" if the key columns A and B are equally found in data.table B. This step is inevitably and has to be done.

library(data.table)
df_1 <- data.frame(A=c(1,1,3,5,6,7), B = c("x","y","z","q","w","e"), C = rep("NO",6))
df_2 <- data.frame(A=c(3,5,1), B = c("z","q","x"), D=c(3,5,99))
keys <- c("A","B")
dt_1 <- data.table(df_1, key = keys)
dt_2 <- data.table(df_2, key = keys)
dt_1[dt_2, C := "OK"]

Now I get the data.table:

   A     B     C
1: 1     x     OK
2: 1     y     NO
3: 3     z     OK
4: 5     q     OK
5: 6     w     NO
6: 7     e     NO

I would like to include a second operation. If in data.table 2 the value of column A is not equal to column D the value of column D should be used after the first operation. Meaning column D is superior to A. This should work no matter how many values in D are different. The desired data.table looks the following:

   A     B     C
1: 99    x     OK
2: 1     y     NO
3: 3     z     OK
4: 5     q     OK
5: 6     w     NO
6: 7     e     NO

I tired something without success.

dt_1[dt_2, A != D, A := D]

Thank you for your help!

758

asked Sep 03 '15 08:09

VDK

1 Answers

Try:

dt_1[C == "OK", A:= dt_2[,D]]

#   A B  C
# 1: 99 x OK
# 2:  1 y NO
# 3:  3 z OK
# 4:  5 q OK
# 5:  6 w NO
# 6:  7 e NO

And here's how you should have done the whole process in the first place.

Create both data sets as data.tables in the first place (or convert in place using setDT)

dt_1 <- data.table(A=c(1,1,3,5,6,7), B = c("x","y","z","q","w","e"), C = rep("NO",6))
dt_2 <- data.table(A=c(3,5,1), B = c("z","q","x"), D=c(3,5,99))

Then key them using setkeyv instead of using the <- operator

keys <- c("A","B")
setkeyv(dt_1, keys)
setkeyv(dt_2, keys)

Then just update both column within a single join

dt_1[dt_2, `:=`(C = "OK", A = i.D)]
#     A B  C
# 1: 99 x OK
# 2:  1 y NO
# 3:  3 z OK
# 4:  5 q OK
# 5:  6 w NO
# 6:  7 e NO

In this case the condition df_1$A != df_2$D is redundant

139

answered Oct 22 '22 06:10

Andriy T.

Related questions
                            
                                Unit testing Rcpp code in a package
                            
                                Data Smoothing in R
                            
                                Package a large data set
                            
                                CRAN finds an warning that R CMD check --as-cran does not
                            
                                Visualize ANCOVA incl formulas (e.g. library HH)
                            
                                ifelse with multiple condition for creating new variable in data.table R [duplicate]
                            
                                Running Callgrind on simple R file
                            
                                geom_point points manual scaling
                            
                                waiting for user input in R from terminal
                            
                                Remove variable from RHS of a formula that has a dot
                            
                                how to define your own distribution for fitdistr function in R with the help of lmomco function
                            
                                Starting Y axis at 0 using ggplot and facet_wrap [duplicate]
                            
                                Simple function counting values from a list within certain range
                            
                                r google search result count retrieve [closed]
                            
                                Add text and line to an `image()` in graphics
                            
                                R: Extract unique values in columns of a dataframe
                            
                                get line number with bash in R
                            
                                Is there a function to split a large dataframe into n smaller dataframes of equal size (by row) and have an n+1 dataframe of smaller size?
                            
                                Simulated Annealing in R: GenSA running time
                            
                                Creating an RPackage - UseMethod can't find function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With