Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Non-equi join, then summarize by group

Tags:

r

data.table

Here is a MWE.

dta <- data.table(id=rep(1:2, each=5), seq=rep(1:5, 2), val=1:10)
dtb <- data.table(id=c(1, 1, 2, 2), fil=c(2, 3, 3, 4))
dtc <- data.table(id=c(1, 1, 2, 2), mval=rep(0, 4))
for (ind in 1:4) dtc$mval[ind] <- mean( dta$val [dta$id == dtb$id[ind] & dta$seq < dtb$fil[ind]] )

dtc
#    id      mval
# 1:  1       1.0
# 2:  1       1.5
# 3:  2       6.5
# 4:  2       7.0

dtc should have the same number of rows as dtb. For every (row) ind in dtc,

  1. dtc$id[ind] = dtb$id[ind].
  2. dtc$mval[ind] = mean(dta$val[x]), where x is dta$id == dtb$id[ind] & dta$seq < dtb$fil[ind].

My data.tables are extremely large. Hence, I am looking for a way to achieve the above with minimal memory footprint. I was thinking a non-equi join and then a summarize, but I can't seem to get that to work. Hence, the title of the question.

Would greatly appreciate any help, thanks!

like image 385
Anirban Mukherjee Avatar asked Sep 18 '16 07:09

Anirban Mukherjee


People also ask

What is non equi join in SQL?

NON EQUI JOIN. The SQL NON EQUI JOIN uses comparison operator instead of the equal sign like >, <, >=, <= along with conditions.

What is an equi join?

An equi join is a join that uses equality operators, so recall that a equality is just a join that uses the equal sign. Some examples of equi joins are join conditions where we’re matching first name from one table to first name from another table, or, for example, where we’re matching...

What is a match and non-equi join?

A match is found in the expression based on an inequality operator used in the join, evaluates to true. Retrieving data from multiple tables based on any condition except equal operator condition is called NON-EQUI join. In the join, we can use the operators such as <,>, <=, >=, and, between, etc.

What are the different types of joins?

We’ve already discussed several types of joins, including self joins and CROSS JOIN, INNER JOIN and OUTER JOIN. These types of joins typically appear with the equals sign (=).


1 Answers

May be this helps

dtc[, mval := dta[dtb, mean(val) ,on =.(id, seq < fil), by = .EACHI]$V1]
dtc
#   id mval
#1:  1  1.0
#2:  1  1.5
#3:  2  6.5
#4:  2  7.0
like image 94
akrun Avatar answered Oct 07 '22 03:10

akrun