I have number of intervals and need to find which ones would form a continous group.
In this MWE, I have Interval.id, Interval.start, and Interval.end. And I want to calculate Wanted.column.
DT <- data.table(Interval.id=c(1L, 2L, 3L, 4L, 5L, 6L),
Interval.start=c(2.0, 3.0, 4.0, 4.6, 4.7, 5.5),
Interval.end=c(4.5, 3.5, 4.8, 5.0, 4.9, 8.0),
Wanted.column=c(1L, 1L, 1L, 1L, 1L, 2L))
I suppose foverlaps
is the friend here, but I can't see how.
How can Wanted.column be calculated?
1) Sort all intervals in increasing order of start time. This step takes O(nLogn) time. 2) In the sorted array, if start time of an interval is less than end of previous interval, then there is an overlap.
Let's take the following overlapping intervals example to explain the idea: If both ranges have at least one common point, then we say that they're overlapping. In other words, we say that two ranges and are overlapping if: On the other hand, non-overlapping ranges don't have any points in common.
DT[ , g := cumsum(
cummax(shift(Interval.end, fill = Interval.end[1])) < Interval.start) + 1]
# Interval.id Interval.start Interval.end Wanted.column g
# 1: 1 2.0 4.5 1 1
# 2: 2 3.0 3.5 1 1
# 3: 3 4.0 4.8 1 1
# 4: 4 4.6 5.0 1 1
# 5: 5 4.7 4.9 1 1
# 6: 6 5.5 8.0 2 2
Credit to highly related answers: Collapse rows with overlapping ranges, How to flatten / merge overlapping time periods
You can first create a data.table with the unique/grouped intervals, and then use foverlaps()
to perform a join.
The main-interval data.table can be created using the intervals
-package. Use the interval_union()
-function to 'merge' intervals into non-overlapping inertvals.
#use the intervals-package to create the "main" unique intervals
library( intervals )
DT.int <- as.data.table(
intervals::interval_union(
intervals::Intervals( as.matrix( DT[, 2:3] ) ) ,
check_valid = TRUE ) )
#set names
setnames( DT.int, names(DT.int), c("start", "end" ) )
#set group_id-column
DT.int[, group_id := .I ][]
# start end group_id
# 1: 2.0 5 1
# 2: 5.5 8 2
#now perform foverlaps()
setkey( DT, Interval.start, Interval.end)
setkey( DT.int, start, end)
foverlaps( DT.int, DT )
# Interval.id Interval.start Interval.end Wanted.column start end group_id
# 1: 1 2.0 4.5 1 2.0 5 1
# 2: 2 3.0 3.5 1 2.0 5 1
# 3: 3 4.0 4.8 1 2.0 5 1
# 4: 4 4.6 5.0 1 2.0 5 1
# 5: 5 4.7 4.9 1 2.0 5 1
# 6: 6 5.5 8.0 2 5.5 8 2
As you can see, the column group_id
matches your Wanted.column
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With