Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Union and intersection of intervals

Tags:

r

intervals

I have a group of intervals for different ids. For example:

df <- data.frame(id=c(rep("a",4),rep("b",2),rep("c",3)), start=c(100,250,400,600,150,610,275,600,700), end=c(200,300,550,650,275,640,325,675,725))

The intervals of each id do not overlap but the intervals of the different ids may overlap. Here is a picture:

plot(range(df[,c(2,3)]),c(1,nrow(df)),type="n",xlab="",ylab="",yaxt="n")
for ( ii in 1:nrow(df) ) lines(c(df[ii,2],df[ii,3]),rep(nrow(df)-ii+1,2),col=as.numeric(df$id[ii]),lwd=2)
legend("bottomleft",lwd=2,col=seq_along(levels(df$id)),legend=levels(df$id))

intervals What I'm looking for is for two functions: 1. A function which will take the union of these intervals. For the example above, it will return this data.frame:

union.df <- data.frame(id=rep("a,b,c",4), start=c(100,400,600,700), end=c(325,550,675,725))
  1. A function which will intersect these intervals, only keeping a range if all the ids overlap for that range. For the example above, it will return this data.frame:

intersection.df <- data.frame(id="a,b,c", start=610, end=640)

like image 545
user1701545 Avatar asked May 06 '15 14:05

user1701545


People also ask

What is the union of an interval?

An interval union is a set of closed and disjoint intervals where the bounds of the extreme intervals can be \pm \infty . During the paper we demonstrate that interval unions generalize intervals and allow among others to represent the result of interval division in a natural way.

What does ∩ mean in interval notation?

∪ - union represents the joining together of two sets. ∩ - intersection represents the overlap between two sets.

Does ∩ MEAN and or OR?

Intersections. An element is in the intersection of two sets if it is in the first set and it is in the second set. The symbol we use for the intersection is ∩. The word that you will often see that indicates an intersection is "and".


2 Answers

The intervals package solves the union part of the question:

require(intervals)
idf <- Intervals(df[,2:3])
as.data.frame(interval_union(idf))

And for the intersect part, depending on how the intervals are defined:

idl <- lapply(unique(df$id),function(x){var <- as(Intervals(df[df$id==x,2:3]),"Intervals_full");closed(var)[,1]<- FALSE;return(var)})
idt <- idl[[1]]
for(i in idl)idt <- interval_intersection(idt,i)
res <- as.data.frame(idt) 
res
   V1  V2
1 610 640
like image 78
Nightwriter Avatar answered Oct 09 '22 03:10

Nightwriter


This is a bit awkward, but the idea is that you unroll the data into a series of opening and closing events. Then you track how many intervals are open at a time. This assume each group doesn't have any overlapping intervals.

df <- data.frame(id=c(rep("a",4),rep("b",2),rep("c",3)), start=c(100,250,400,600,150,610,275,600,700), end=c(200,300,550,650,275,640,325,675,725))


sets<-function(start, end, group, overlap=length(unique(group))) {
    dd<-rbind(data.frame(pos=start, event=1), data.frame(pos=end, event=-1))
    dd<-aggregate(event~pos, dd, sum)
    dd<-dd[order(dd$pos),]
    dd$open <- cumsum(dd$event)
    r<-rle(dd$open>=overlap)
    ex<-cumsum(r$lengths-1 + rep(1, length(r$lengths))) 
    sx<-ex-r$lengths+1
    cbind(dd$pos[sx[r$values]],dd$pos[ex[r$values]+1])

} 

#union
with(df, sets(start, end, id,1))
#     [,1] [,2]
# [1,]  100  325
# [2,]  400  550
# [3,]  600  675
# [4,]  700  725

#overlap
with(df, sets(start, end, id,3))
#      [,1] [,2]
# [1,]  610  640
like image 33
MrFlick Avatar answered Oct 09 '22 02:10

MrFlick