Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to update both data.tables in a join

Suppose I would like to track which rows from one data.table were merged to another data.table. is there a way to do this at once/while merging? Please see my example below and the way I usually do it. However, this seems rather inefficient.

Example

library(data.table)

# initial data
DT = data.table(x = c(1,1,1,2,2,1,1,2,2), 
                y = c(1,3,6))

# data to merge
DTx <- data.table(x = 1:3,
                  y = 1,
                  k = "X")

# regular update join
copy(DT)[DTx,
         on = .(x, y),
         k := i.k][]
#>    x y    k
#> 1: 1 1    X
#> 2: 1 3 <NA>
#> 3: 1 6 <NA>
#> 4: 2 1    X
#> 5: 2 3 <NA>
#> 6: 1 6 <NA>
#> 7: 1 1    X
#> 8: 2 3 <NA>
#> 9: 2 6 <NA>

# DTx remains the same
DTx
#>    x y k
#> 1: 1 1 X
#> 2: 2 1 X
#> 3: 3 1 X

What I usually do:

# set an Id variable
DTx[, Id := .I]

# assign the Id in merge
DT[DTx,
   on = .(x, y),
   `:=`(k = i.k,
        matched_id = i.Id)][]
#>    x y    k matched_id
#> 1: 1 1    X          1
#> 2: 1 3 <NA>         NA
#> 3: 1 6 <NA>         NA
#> 4: 2 1    X          2
#> 5: 2 3 <NA>         NA
#> 6: 1 6 <NA>         NA
#> 7: 1 1    X          1
#> 8: 2 3 <NA>         NA
#> 9: 2 6 <NA>         NA

# use matched_id to find merged rows
DTx[, matched := fifelse(Id %in% DT$matched_id, TRUE, FALSE)]
DTx
#>    x y k Id matched
#> 1: 1 1 X  1    TRUE
#> 2: 2 1 X  2    TRUE
#> 3: 3 1 X  3   FALSE
like image 642
mnist Avatar asked Dec 02 '21 16:12

mnist


People also ask

How use multiple tables in SQL update with join?

To UPDATE a table by joining multiple tables in SQL, let's create the two tables 'order' and 'order_detail. ' We can update the data of a table using conditions of other joined tables. It is possible to join two or more tables in an UPDATE query.

Can you update a table with a join?

SQL UPDATE JOIN could be used to update one table using another table and join condition. UPDATE tablename INNER JOIN tablename ON tablename.

Can I update two tables in single query?

1 Answer. It's not possible to update multiple tables in one statement, however, you can use the transaction to make sure that two UPDATE statements must be treated atomically. You can also batch them to avoid a round trip like this.

How to use multiple tables in SQL UPDATE statement with join?

How to use multiple tables in SQL UPDATE statement with JOIN. Let's take two tables, table 1 and table 2. Create table1. Create table2. Now check the content in the table. Our requirement is that we have table 2 which has two rows where Col 1 is 21 and 31. We want to update the value from table 2 to table 1 for the rows where Col 1 is 21 and 31.

What is update join in MySQL?

SQL UPDATE JOIN means we will update one table using another table and join condition. Let us take an example of a customer table.

How to update more than one table at a time?

You can't update more that one table in a single statement, however the error message you get is because of the aliases, you could try this : Show activity on this post. But you are trying to affect multiple tables with an update statement that joins on multiple tables. That is not possible.

How to merge two tables in SQL Server?

The most easiest and common way is to use join clause in the update statement and use multiple tables in the update statement. Check the content of the table. SELECT FROM table 1. SELECT FROM table 2. Here we can see that using join clause in update statement. We have merged two tables by the use of join clause.


1 Answers

Following Jan's comment:

This will provide you indices of matching rows but you will have to call merge again to perform actual merging, unless you manually use provided indices to match/update those tables.

You can pull the indices:

merge_metaDT = DT[DTx, on=.(x, y), .(irow = .GRP, xrow = .I), by=.EACHI]

   x y irow xrow
1: 1 1    1    1
2: 1 1    1    7
3: 2 1    2    4
4: 3 1    3    0

Then apply edits to each table using indices rather than merging or matching a second time:

rowDT = merge_metaDT[xrow != 0L]
DT[rowDT$xrow, k := DTx[rowDT$irow, k]]
DTx[, matched := FALSE][rowDT$irow, matched := TRUE]

How it works:

  • When joining, x[i], the symbol .I indexes rows of x
  • When grouping in a join with by=.EACHI, .GRP indexes each group, which means each row of i here
  • We drop the non-matching values of .I which are coded as zeros

On this last point, we might expect NAs instead of zeros, as returned by DT[DTx, on=.(x, y), which=TRUE]. I'm not sure why these differ.


Suppose I would like to track which rows from one data.table were merged to another data.table. is there a way to do this at once/while merging? [...] seems rather inefficient.

I expect this is more efficient than multiple merges or %in% when the merge is costly enough.

It still requires multiple steps. I doubt there's any way around that, since it would be hard to come up with logic and syntax for the update that is easy to follow.

Update logic is already complex in base R, with multiple edits on a single index allowed:

> x = c(1, 2, 3)
> x[c(1, 1)] = c(4, 5)
> x
[1] 5 2 3

And there is the question of how to match and edit multiple indices at once:

> x = c(1, 1, 3)
> x[match(c(1, 3), x)] = c(4, 5)
> x
[1] 4 1 5

In data.table updates, the latter issue is handled with mult=. In the update-two-tables use case, these questions would get much more complicated.

like image 63
Frank Avatar answered Sep 30 '22 17:09

Frank