When doing an update-join, where the i
table has multiple rows per key, how can you control which row is returned?
In this example, the update-join returns the last row from dt2
library(data.table)
dt1 <- data.table(id = 1)
dt2 <- data.table(id = 1, letter = letters)
dt1[
dt2
, on = "id"
, letter := i.letter
]
dt1
# id letter
# 1: 1 z
How can I control it to return the 1st, 2nd, nth row, rather than defaulting to the last?
A couple of references similar to this by user @Frank
An UPDATE statement can include JOIN operations. An UPDATE can contain zero, one, or multiple JOIN operations.
Merge multiple rows using formulas To joint the values from several cells into one, you can use either the CONCATENATE function or concatenation operator (&). In Excel 2016 and higher, you can also use the CONCAT function. Any way, you supply cells as references and type the desired delimiters in-between.
In SQL Server, we can join two or more tables, but we cannot update the data of multiple tables in a single UPDATE statement.
First, specify the name of the table (t1) that you want to update in the UPDATE clause. Next, specify the new value for each column of the updated table. Then, again specify the table from which you want to update in the FROM clause.
The most flexible idea I can think of is to only join the part of dt2
which contains the rows you want. So, for the second row:
dt1[
dt2[, .SD[2], by=id]
, on = "id"
, letter := i.letter
]
dt1
# id letter
#1: 1 b
With a hat-tip to @Frank for simplifying the sub-select of dt2
.
How can I control it to return the 1st, 2nd, nth row, rather than defaulting to the last?
Not elegant, but sort-of works:
n = 3L
dt1[, v := dt2[.SD, on=.(id), x.letter[n], by=.EACHI]$V1]
A couple problems:
It doesn't select using GForce, eg as seen here:
> dt2[, letter[3], by=id, verbose=TRUE]
Detected that j uses these columns: letter
Finding groups using forderv ... 0.020sec
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000sec
lapply optimization is on, j unchanged as 'letter[3]'
GForce optimized j to '`g[`(letter, 3)'
Making each group and running j (GForce TRUE) ... 0.000sec
id V1
1: 1 c
If n
is outside of 1:.N
for some joined groups, no warning will be given:
n = 40L
dt1[, v := dt2[.SD, on=.(id), x.letter[n], by=.EACHI]$V1]
Alternately, make a habit of checking that i
in an update join x[i]
is "keyed" by the join columns:
cols = "id"
stopifnot(nrow(dt2) == uniqueN(dt2, by=cols))
And then make a different i
table to join on if appropriate
mDT = dt2[, .(letter = letter[3L]), by=id]
dt1[mDT, on=cols, v := i.letter]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With