Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find groups of overlapping intervals with data.table

Tags:

r

data.table

I have number of intervals and need to find which ones would form a continous group.

In this MWE, I have Interval.id, Interval.start, and Interval.end. And I want to calculate Wanted.column.

DT <- data.table(Interval.id=c(1L, 2L, 3L, 4L, 5L, 6L),
                 Interval.start=c(2.0, 3.0, 4.0, 4.6, 4.7, 5.5),
                 Interval.end=c(4.5, 3.5, 4.8, 5.0, 4.9, 8.0),
                 Wanted.column=c(1L, 1L, 1L, 1L, 1L, 2L))

I suppose foverlaps is the friend here, but I can't see how.

How can Wanted.column be calculated?

like image 636
Chris Avatar asked Sep 29 '19 06:09

Chris


People also ask

How do you find overlapping intervals?

1) Sort all intervals in increasing order of start time. This step takes O(nLogn) time. 2) In the sorted array, if start time of an interval is less than end of previous interval, then there is an overlap.

What is the meaning of overlapping intervals?

Let's take the following overlapping intervals example to explain the idea: If both ranges have at least one common point, then we say that they're overlapping. In other words, we say that two ranges and are overlapping if: On the other hand, non-overlapping ranges don't have any points in common.


Video Answer


2 Answers

DT[ , g := cumsum(
  cummax(shift(Interval.end, fill = Interval.end[1])) < Interval.start) + 1]

#    Interval.id Interval.start Interval.end Wanted.column   g
# 1:           1            2.0          4.5             1   1
# 2:           2            3.0          3.5             1   1
# 3:           3            4.0          4.8             1   1
# 4:           4            4.6          5.0             1   1
# 5:           5            4.7          4.9             1   1
# 6:           6            5.5          8.0             2   2

Credit to highly related answers: Collapse rows with overlapping ranges, How to flatten / merge overlapping time periods

like image 168
Henrik Avatar answered Nov 14 '22 23:11

Henrik


You can first create a data.table with the unique/grouped intervals, and then use foverlaps() to perform a join. The main-interval data.table can be created using the intervals-package. Use the interval_union()-function to 'merge' intervals into non-overlapping inertvals.

#use the intervals-package to create the "main" unique intervals
library( intervals )
DT.int <- as.data.table(
  intervals::interval_union( 
    intervals::Intervals( as.matrix( DT[, 2:3] ) ) , 
    check_valid = TRUE ) )
#set names
setnames( DT.int, names(DT.int), c("start", "end" ) )
#set group_id-column
DT.int[, group_id := .I ][]
#    start end group_id
# 1:   2.0   5        1
# 2:   5.5   8        2

#now perform foverlaps()
setkey( DT, Interval.start, Interval.end)
setkey( DT.int, start, end)
foverlaps( DT.int, DT )

#    Interval.id Interval.start Interval.end Wanted.column start end group_id
# 1:           1            2.0          4.5             1   2.0   5        1
# 2:           2            3.0          3.5             1   2.0   5        1
# 3:           3            4.0          4.8             1   2.0   5        1
# 4:           4            4.6          5.0             1   2.0   5        1
# 5:           5            4.7          4.9             1   2.0   5        1
# 6:           6            5.5          8.0             2   5.5   8        2

As you can see, the column group_id matches your Wanted.column

like image 38
Wimpel Avatar answered Nov 14 '22 23:11

Wimpel