Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding Overlaps between interval sets / Efficient Overlap Joins

Tags:

Overview:

I need to join two tables:

ref contains the time intervals (from t1 to t2) along with an id for each interval and a space where this interval occurs.

map contains time intervals (t1 to t2) each with a result res and its corresponding space.

I wish to obtain/join all intervals (and their score) of map that fall within the intervals in ref onto ref.

example:

ref <- data.table(space=rep('nI',3),t1=c(100,300,500),t2=c(150,400,600),id=letters[1:3])  map <- data.table(space=rep('nI',241),t1=seq(0,1200,by=5),t2=seq(5,1205,by=5),res=rnorm(241)) 

they look like:

> ref space  t1  t2 id 1:    nI 100 150  a 2:    nI 300 400  b 3:    nI 500 600  c  > map space   t1   t2        res 1:    nI    0    5 -0.7082922 2:    nI    5   10  1.8251041 3:    nI   10   15  0.2076552 4:    nI   15   20  0.8047347 5:    nI   20   25  2.3388920 ---                            237:    nI 1180 1185  1.0229284 238:    nI 1185 1190 -0.3657815 239:    nI 1190 1195  0.3013489 240:    nI 1195 1200  1.2947271 241:    nI 1200 1205 -1.5050221 

(UPDATE) Solution

  • ?data.table::foverlaps is the key here.

I need to join all the map intervals that occur "within" the intervals of ref and I am not interested in intervals that do not match this key so use nomatch=0L.

setkey(ref,space,t1,t2)  foverlaps(map,ref,type="within",nomatch=0L) 

which gives:

space  t1  t2 id i.t1 i.t2         res 1:    nI 100 150  a  100  105 -0.85202726 2:    nI 100 150  a  105  110  0.79748876 3:    nI 100 150  a  110  115  1.49894097 4:    nI 100 150  a  115  120  0.47719957 5:    nI 100 150  a  120  125 -0.95767896 6:    nI 100 150  a  125  130 -0.51054673 7:    nI 100 150  a  130  135 -0.08478700 8:    nI 100 150  a  135  140 -0.69526566 9:    nI 100 150  a  140  145  2.14917623 10:    nI 100 150  a  145  150 -0.05348163 11:    nI 300 400  b  300  305  0.28834548 12:    nI 300 400  b  305  310  0.32449616 13:    nI 300 400  b  310  315  1.16107248 14:    nI 300 400  b  315  320  1.08550676 15:    nI 300 400  b  320  325  0.84640788 16:    nI 300 400  b  325  330 -2.15485447 17:    nI 300 400  b  330  335  1.59115714 18:    nI 300 400  b  335  340 -0.57588128 19:    nI 300 400  b  340  345  0.23957563 20:    nI 300 400  b  345  350 -0.60824259 21:    nI 300 400  b  350  355 -0.84828189 22:    nI 300 400  b  355  360 -0.43528701 23:    nI 300 400  b  360  365 -0.80026281 24:    nI 300 400  b  365  370 -0.62914234 25:    nI 300 400  b  370  375 -0.83485164 26:    nI 300 400  b  375  380  1.46922713 27:    nI 300 400  b  380  385 -0.53965310 28:    nI 300 400  b  385  390  0.98728765 29:    nI 300 400  b  390  395 -0.66328893 30:    nI 300 400  b  395  400 -0.08182384 31:    nI 500 600  c  500  505  0.72566100 32:    nI 500 600  c  505  510  2.27878366 33:    nI 500 600  c  510  515  0.72974139 34:    nI 500 600  c  515  520 -0.35358019 35:    nI 500 600  c  520  525 -1.20697646 36:    nI 500 600  c  525  530 -0.01719057 37:    nI 500 600  c  530  535  0.06686472 38:    nI 500 600  c  535  540 -0.40866088 39:    nI 500 600  c  540  545 -1.02697573 40:    nI 500 600  c  545  550  2.19822065 41:    nI 500 600  c  550  555  0.57075648 42:    nI 500 600  c  555  560 -0.52009726 43:    nI 500 600  c  560  565 -1.82999177 44:    nI 500 600  c  565  570  2.53776578 45:    nI 500 600  c  570  575  0.85626293 46:    nI 500 600  c  575  580 -0.34245708 47:    nI 500 600  c  580  585  1.21679869 48:    nI 500 600  c  585  590  1.87587020 49:    nI 500 600  c  590  595 -0.23325264 50:    nI 500 600  c  595  600  0.18845022 space  t1  t2 id i.t1 i.t2         res 
like image 688
npjc Avatar asked Sep 12 '14 18:09

npjc


1 Answers

Ha, nice timing :). Just a few days back, overlap joins (or interval joins) was implemented. in data.table The function is foverlaps() and is available from the github project page. Make sure to have a look at ?foverlaps.

setkey(ref, space, t1, t2) foverlaps(map, ref, type="within", nomatch=0L) 

I think this is what you're after. This'll result in the join result only where there's a match, and it'll check for t1,t2 overlaps between ref and map within space identifier.. If you don't want that, just remove space from the key column. And if you want all matches, remove nomatch=0L - the default is nomatch=NA which returns all.

The function is new (but has been rigorously tested) and is therefore not feature complete. If you've any suggestions for improvement or come across any issues, please feel free to file an issue.

like image 190
Arun Avatar answered Sep 16 '22 13:09

Arun