Suppose I would like to track which rows from one data.table were merged to another data.table. is there a way to do this at once/while merging? Please see my example below and the way I usually do it. However, this seems rather inefficient.
library(data.table)
# initial data
DT = data.table(x = c(1,1,1,2,2,1,1,2,2),
y = c(1,3,6))
# data to merge
DTx <- data.table(x = 1:3,
y = 1,
k = "X")
# regular update join
copy(DT)[DTx,
on = .(x, y),
k := i.k][]
#> x y k
#> 1: 1 1 X
#> 2: 1 3 <NA>
#> 3: 1 6 <NA>
#> 4: 2 1 X
#> 5: 2 3 <NA>
#> 6: 1 6 <NA>
#> 7: 1 1 X
#> 8: 2 3 <NA>
#> 9: 2 6 <NA>
# DTx remains the same
DTx
#> x y k
#> 1: 1 1 X
#> 2: 2 1 X
#> 3: 3 1 X
# set an Id variable
DTx[, Id := .I]
# assign the Id in merge
DT[DTx,
on = .(x, y),
`:=`(k = i.k,
matched_id = i.Id)][]
#> x y k matched_id
#> 1: 1 1 X 1
#> 2: 1 3 <NA> NA
#> 3: 1 6 <NA> NA
#> 4: 2 1 X 2
#> 5: 2 3 <NA> NA
#> 6: 1 6 <NA> NA
#> 7: 1 1 X 1
#> 8: 2 3 <NA> NA
#> 9: 2 6 <NA> NA
# use matched_id to find merged rows
DTx[, matched := fifelse(Id %in% DT$matched_id, TRUE, FALSE)]
DTx
#> x y k Id matched
#> 1: 1 1 X 1 TRUE
#> 2: 2 1 X 2 TRUE
#> 3: 3 1 X 3 FALSE
To UPDATE a table by joining multiple tables in SQL, let's create the two tables 'order' and 'order_detail. ' We can update the data of a table using conditions of other joined tables. It is possible to join two or more tables in an UPDATE query.
SQL UPDATE JOIN could be used to update one table using another table and join condition. UPDATE tablename INNER JOIN tablename ON tablename.
1 Answer. It's not possible to update multiple tables in one statement, however, you can use the transaction to make sure that two UPDATE statements must be treated atomically. You can also batch them to avoid a round trip like this.
How to use multiple tables in SQL UPDATE statement with JOIN. Let's take two tables, table 1 and table 2. Create table1. Create table2. Now check the content in the table. Our requirement is that we have table 2 which has two rows where Col 1 is 21 and 31. We want to update the value from table 2 to table 1 for the rows where Col 1 is 21 and 31.
SQL UPDATE JOIN means we will update one table using another table and join condition. Let us take an example of a customer table.
You can't update more that one table in a single statement, however the error message you get is because of the aliases, you could try this : Show activity on this post. But you are trying to affect multiple tables with an update statement that joins on multiple tables. That is not possible.
The most easiest and common way is to use join clause in the update statement and use multiple tables in the update statement. Check the content of the table. SELECT FROM table 1. SELECT FROM table 2. Here we can see that using join clause in update statement. We have merged two tables by the use of join clause.
Following Jan's comment:
This will provide you indices of matching rows but you will have to call merge again to perform actual merging, unless you manually use provided indices to match/update those tables.
You can pull the indices:
merge_metaDT = DT[DTx, on=.(x, y), .(irow = .GRP, xrow = .I), by=.EACHI]
x y irow xrow
1: 1 1 1 1
2: 1 1 1 7
3: 2 1 2 4
4: 3 1 3 0
Then apply edits to each table using indices rather than merging or matching a second time:
rowDT = merge_metaDT[xrow != 0L]
DT[rowDT$xrow, k := DTx[rowDT$irow, k]]
DTx[, matched := FALSE][rowDT$irow, matched := TRUE]
How it works:
x[i]
, the symbol .I
indexes rows of x
by=.EACHI
, .GRP
indexes each group, which means each row of i
here.I
which are coded as zerosOn this last point, we might expect NAs instead of zeros, as returned by DT[DTx, on=.(x, y), which=TRUE]
. I'm not sure why these differ.
Suppose I would like to track which rows from one data.table were merged to another data.table. is there a way to do this at once/while merging? [...] seems rather inefficient.
I expect this is more efficient than multiple merges or %in%
when the merge is costly enough.
It still requires multiple steps. I doubt there's any way around that, since it would be hard to come up with logic and syntax for the update that is easy to follow.
Update logic is already complex in base R, with multiple edits on a single index allowed:
> x = c(1, 2, 3)
> x[c(1, 1)] = c(4, 5)
> x
[1] 5 2 3
And there is the question of how to match and edit multiple indices at once:
> x = c(1, 1, 3)
> x[match(c(1, 3), x)] = c(4, 5)
> x
[1] 4 1 5
In data.table updates, the latter issue is handled with mult=
. In the update-two-tables use case, these questions would get much more complicated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With