I have a group of intervals for different ids. For example:
df <- data.frame(id=c(rep("a",4),rep("b",2),rep("c",3)), start=c(100,250,400,600,150,610,275,600,700), end=c(200,300,550,650,275,640,325,675,725))
The intervals of each id do not overlap but the intervals of the different ids may overlap. Here is a picture:
plot(range(df[,c(2,3)]),c(1,nrow(df)),type="n",xlab="",ylab="",yaxt="n")
for ( ii in 1:nrow(df) ) lines(c(df[ii,2],df[ii,3]),rep(nrow(df)-ii+1,2),col=as.numeric(df$id[ii]),lwd=2)
legend("bottomleft",lwd=2,col=seq_along(levels(df$id)),legend=levels(df$id))
What I'm looking for is for two functions: 1. A function which will take the union of these intervals. For the example above, it will return this data.frame:
union.df <- data.frame(id=rep("a,b,c",4), start=c(100,400,600,700), end=c(325,550,675,725))
intersection.df <- data.frame(id="a,b,c", start=610, end=640)
An interval union is a set of closed and disjoint intervals where the bounds of the extreme intervals can be \pm \infty . During the paper we demonstrate that interval unions generalize intervals and allow among others to represent the result of interval division in a natural way.
∪ - union represents the joining together of two sets. ∩ - intersection represents the overlap between two sets.
Intersections. An element is in the intersection of two sets if it is in the first set and it is in the second set. The symbol we use for the intersection is ∩. The word that you will often see that indicates an intersection is "and".
The intervals package solves the union part of the question:
require(intervals)
idf <- Intervals(df[,2:3])
as.data.frame(interval_union(idf))
And for the intersect part, depending on how the intervals are defined:
idl <- lapply(unique(df$id),function(x){var <- as(Intervals(df[df$id==x,2:3]),"Intervals_full");closed(var)[,1]<- FALSE;return(var)})
idt <- idl[[1]]
for(i in idl)idt <- interval_intersection(idt,i)
res <- as.data.frame(idt)
res
V1 V2
1 610 640
This is a bit awkward, but the idea is that you unroll the data into a series of opening and closing events. Then you track how many intervals are open at a time. This assume each group doesn't have any overlapping intervals.
df <- data.frame(id=c(rep("a",4),rep("b",2),rep("c",3)), start=c(100,250,400,600,150,610,275,600,700), end=c(200,300,550,650,275,640,325,675,725))
sets<-function(start, end, group, overlap=length(unique(group))) {
dd<-rbind(data.frame(pos=start, event=1), data.frame(pos=end, event=-1))
dd<-aggregate(event~pos, dd, sum)
dd<-dd[order(dd$pos),]
dd$open <- cumsum(dd$event)
r<-rle(dd$open>=overlap)
ex<-cumsum(r$lengths-1 + rep(1, length(r$lengths)))
sx<-ex-r$lengths+1
cbind(dd$pos[sx[r$values]],dd$pos[ex[r$values]+1])
}
#union
with(df, sets(start, end, id,1))
# [,1] [,2]
# [1,] 100 325
# [2,] 400 550
# [3,] 600 675
# [4,] 700 725
#overlap
with(df, sets(start, end, id,3))
# [,1] [,2]
# [1,] 610 640
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With