Here is a MWE. <pre class="prettyprint"><code>dta <- data.table(id=rep(1:2, each=5), seq=rep(1:5, 2), val=1:10) dtb <- data.table(id=c(1, 1, 2, 2), fil=c(2, 3, 3, 4)) dtc <- data.table(id=c(1, 1, 2, 2), mval=rep(0, 4)) for (ind in 1:4) dtc$mval[ind] <- mean( dta$val [dta$id == dtb$id[ind] & dta$seq < dtb$fil[ind]] ) dtc # id mval # 1: 1 1.0 # 2: 1 1.5 # 3: 2 6.5 # 4: 2 7.0 </code></pre> dtc should have the same number of rows as dtb. For every (row) <code>ind</code> in dtc, <ol> <li> <code>dtc$id[ind]</code> = <code>dtb$id[ind]</code>. </li> <li> <code>dtc$mval[ind]</code> = <code>mean(dta$val[x])</code>, where x is <code>dta$id == dtb$id[ind] & dta$seq < dtb$fil[ind]</code>.</li> </ol> My data.tables are extremely large. Hence, I am looking for a way to achieve the above with minimal memory footprint. I was thinking a non-equi join and then a summarize, but I can't seem to get that to work. Hence, the title of the question. Would greatly appreciate any help, thanks!

May be this helps <pre class="prettyprint"><code>dtc[, mval := dta[dtb, mean(val) ,on =.(id, seq < fil), by = .EACHI]$V1] dtc # id mval #1: 1 1.0 #2: 1 1.5 #3: 2 6.5 #4: 2 7.0 </code></pre>

Non-equi join, then summarize by group

Q: What is non equi join in SQL?

NON EQUI JOIN. The SQL NON EQUI JOIN uses comparison operator instead of the equal sign like >, <, >=, <= along with conditions.

Q: What is a match and non-equi join?

A match is found in the expression based on an inequality operator used in the join, evaluates to true. Retrieving data from multiple tables based on any condition except equal operator condition is called NON-EQUI join. In the join, we can use the operators such as <,>, <=, >=, and, between, etc.

Tags:

r

data.table

Here is a MWE.

dta <- data.table(id=rep(1:2, each=5), seq=rep(1:5, 2), val=1:10)
dtb <- data.table(id=c(1, 1, 2, 2), fil=c(2, 3, 3, 4))
dtc <- data.table(id=c(1, 1, 2, 2), mval=rep(0, 4))
for (ind in 1:4) dtc$mval[ind] <- mean( dta$val [dta$id == dtb$id[ind] & dta$seq < dtb$fil[ind]] )

dtc
#    id      mval
# 1:  1       1.0
# 2:  1       1.5
# 3:  2       6.5
# 4:  2       7.0

dtc should have the same number of rows as dtb. For every (row) ind in dtc,

dtc$id[ind] = dtb$id[ind].
dtc$mval[ind] = mean(dta$val[x]), where x is dta$id == dtb$id[ind] & dta$seq < dtb$fil[ind].

My data.tables are extremely large. Hence, I am looking for a way to achieve the above with minimal memory footprint. I was thinking a non-equi join and then a summarize, but I can't seem to get that to work. Hence, the title of the question.

Would greatly appreciate any help, thanks!

385

asked Sep 18 '16 07:09

Anirban Mukherjee

1 Answers

May be this helps

dtc[, mval := dta[dtb, mean(val) ,on =.(id, seq < fil), by = .EACHI]$V1]
dtc
#   id mval
#1:  1  1.0
#2:  1  1.5
#3:  2  6.5
#4:  2  7.0

answered Oct 07 '22 03:10

akrun

Related questions
                            
                                geom_bar ggplot2 stacked, grouped bar plot with positive and negative values - pyramid plot
                            
                                How do I create a reactive plot using ggplot in Shiny application
                            
                                Use ggplot to plot over an image with legend
                            
                                How to show more bubble sizes in legend of ggplot?
                            
                                add a data frame to an existing rdata file
                            
                                Variable Selection with mgcv
                            
                                Changing date time format in R
                            
                                How to view package license file?
                            
                                Interactive datatable: keep column filters after rerendering the table
                            
                                Write dates to Excel properly from R
                            
                                Add tick marks to facet plots in R
                            
                                plotly grouped bar chart does not show negative values
                            
                                Potential bug in R's `polr` function when run from a function environment?
                            
                                How to draw a stem and leaf plot which shows the real leaf without rounding up
                            
                                What is a reliable way to change and then unchange the locale in R?
                            
                                How to implement q-learning in R?
                            
                                Extracting matched words from a string
                            
                                coerce a function call into a string
                            
                                find indices of values within tolerance range in R
                            
                                Is there a way of getting "marginal effects" from a `glmer` object

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With