Here is a MWE.
dta <- data.table(id=rep(1:2, each=5), seq=rep(1:5, 2), val=1:10)
dtb <- data.table(id=c(1, 1, 2, 2), fil=c(2, 3, 3, 4))
dtc <- data.table(id=c(1, 1, 2, 2), mval=rep(0, 4))
for (ind in 1:4) dtc$mval[ind] <- mean( dta$val [dta$id == dtb$id[ind] & dta$seq < dtb$fil[ind]] )
dtc
# id mval
# 1: 1 1.0
# 2: 1 1.5
# 3: 2 6.5
# 4: 2 7.0
dtc should have the same number of rows as dtb. For every (row) ind
in dtc,
dtc$id[ind]
= dtb$id[ind]
. dtc$mval[ind]
= mean(dta$val[x])
, where x is dta$id == dtb$id[ind] & dta$seq < dtb$fil[ind]
.My data.tables are extremely large. Hence, I am looking for a way to achieve the above with minimal memory footprint. I was thinking a non-equi join and then a summarize, but I can't seem to get that to work. Hence, the title of the question.
Would greatly appreciate any help, thanks!
NON EQUI JOIN. The SQL NON EQUI JOIN uses comparison operator instead of the equal sign like >, <, >=, <= along with conditions.
An equi join is a join that uses equality operators, so recall that a equality is just a join that uses the equal sign. Some examples of equi joins are join conditions where we’re matching first name from one table to first name from another table, or, for example, where we’re matching...
A match is found in the expression based on an inequality operator used in the join, evaluates to true. Retrieving data from multiple tables based on any condition except equal operator condition is called NON-EQUI join. In the join, we can use the operators such as <,>, <=, >=, and, between, etc.
We’ve already discussed several types of joins, including self joins and CROSS JOIN, INNER JOIN and OUTER JOIN. These types of joins typically appear with the equals sign (=).
May be this helps
dtc[, mval := dta[dtb, mean(val) ,on =.(id, seq < fil), by = .EACHI]$V1]
dtc
# id mval
#1: 1 1.0
#2: 1 1.5
#3: 2 6.5
#4: 2 7.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With