Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find overlaps in time intervals by group and return subsetted data.frame

Say I have this dataframe, which has two IDs (1/2) with their start and end times in three different zones (A/B/C):

df <- structure(list(id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2), zone = c("A", 
"B", "A", "C", "B", "A", "B", "A", "B", "C"), start = c(0, 6, 
7, 8, 10, 0, 3, 5, 6, 7), end = c(6, 7, 8, 10, 11, 3, 5, 6, 7, 
11)), row.names = c(NA, -10L), class = "data.frame")

df

   id zone start end
1   1    A     0   6
2   1    B     6   7
3   1    A     7   8
4   1    C     8  10
5   1    B    10  11
6   2    A     0   3
7   2    B     3   5
8   2    A     5   6
9   2    B     6   7
10  2    C     7  11

If we look at each zone, we can visually inspect the times when IDs are in the same zone and when they are not:

split(df,df$zone)

$A
  id zone start end
1  1    A     0   6
3  1    A     7   8
6  2    A     0   3
8  2    A     5   6

$B
  id zone start end
2  1    B     6   7
5  1    B    10  11
7  2    B     3   5
9  2    B     6   7

$C
   id zone start end
4   1    C     8  10
10  2    C     7  11

e.g. Both 1 and 2 are together in zone A from 0-3, and from 5-6, but not at other times.

Desired Output

I want to extract three dataframes.

  1. A dataframe showing the times and zones they are together:
  zone start end  id
1    A     0   3 1-2
2    A     5   6 1-2
3    B     6   7 1-2
4    C     8  10 1-2

2 & 3: Dataframes for times when they are not together:

#id=1
  zone start end
1    A     3   5
2    A     7   8
3    B    10  11

#id=2
  zone start end
1    B     3   5
2    C     7   8
3    C    10  11

I have been trying to work with foverlaps from data.table and the intervals package, but can't seem to work out the correct method.

e.g. Subsetting each zone/id, I can sort of get an output that includes overlaps, but it doesn't seem to be quite the right direction:

A <- split(df,df$zone)$A
Asp <- split(A,A$id)
x <- setDT(Asp[[1]])
y <- setDT(Asp[[2]])

setkey(y, start, end)

foverlaps(x, y, type="any")

   id zone start end i.id i.zone i.start i.end
1:  2    A     0   3    1      A       0     6
2:  2    A     5   6    1      A       0     6
3: NA <NA>    NA  NA    1      A       7     8

Any help greatly appreciated.

EDIT: Extra example dataset that seemed to bring up some issues with current suggested solutions:

df2 <- structure(list(start = c(0, 5, 6, 8, 10, 13, 15, 20, 22, 26, 
       29, 37, 40, 42, 0, 3, 6, 9, 15, 20, 25, 33, 35, 40), id = c(1, 
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 
       2, 2), zone = c("A", "B", "A", "D", "C", "B", "C", "B", "A", 
       "B", "A", "D", "C", "D", "A", "B", "C", "D", "A", "B", "C", "B", 
       "A", "D"), end = c(5, 6, 8, 10, 13, 15, 20, 22, 26, 29, 37, 40, 
       42, 45, 3, 6, 9, 15, 20, 25, 33, 35, 40, 45)), class = c("data.table", "data.frame"), row.names = c(NA, -24L))
          
df2

    start id zone end
 1:     0  1    A   5
 2:     5  1    B   6
 3:     6  1    A   8
 4:     8  1    D  10
 5:    10  1    C  13
 6:    13  1    B  15
 7:    15  1    C  20
 8:    20  1    B  22
 9:    22  1    A  26
10:    26  1    B  29
11:    29  1    A  37
12:    37  1    D  40
13:    40  1    C  42
14:    42  1    D  45
15:     0  2    A   3
16:     3  2    B   6
17:     6  2    C   9
18:     9  2    D  15
19:    15  2    A  20
20:    20  2    B  25
21:    25  2    C  33
22:    33  2    B  35
23:    35  2    A  40
24:    40  2    D  45
    start id zone end

like image 290
jalapic Avatar asked Jul 16 '21 21:07

jalapic


Video Answer


1 Answers

This seems to work, filtering the foverlaps output:

DT = data.table(df)
setkey(DT, start, end)
oDT0 = foverlaps(DT[id==1], DT[id==2])
oDT0[, `:=`(
  ostart = pmax(start, i.start),
  oend = pmin(end, i.end)
)]
oDT = oDT0[ostart < oend]

# together
oDT[zone == i.zone, .(ids = '1-2', zone, ostart, oend)]
#    ids zone ostart oend
# 1: 1-2    A      0    3
# 2: 1-2    A      5    6
# 3: 1-2    B      6    7
# 4: 1-2    C      8   10

# apart
oDT[zone != i.zone, .(id, zone, i.id, i.zone, ostart, oend)]
#    id zone i.id i.zone ostart oend
# 1:  2    B    1      A      3    5
# 2:  2    C    1      A      7    8
# 3:  2    C    1      B     10   11

Repeating for new input... not sure if it's correct since no expected output was provided:

> DT = data.table(df2)
> ...
> oDT[zone == i.zone, .(ids = '1-2', zone, ostart, oend)]
   ids zone ostart oend
1: 1-2    A      0    3
2: 1-2    B      5    6
3: 1-2    D      9   10
4: 1-2    B     20   22
5: 1-2    A     35   37
6: 1-2    D     42   45
> oDT[zone != i.zone, .(id, zone, i.id, i.zone, ostart, oend)]
    id zone i.id i.zone ostart oend
 1:  2    B    1      A      3    5
 2:  2    C    1      A      6    8
 3:  2    C    1      D      8    9
 4:  2    D    1      C     10   13
 5:  2    D    1      B     13   15
 6:  2    A    1      C     15   20
 7:  2    B    1      A     22   25
 8:  2    C    1      A     25   26
 9:  2    C    1      B     26   29
10:  2    C    1      A     29   33
11:  2    B    1      A     33   35
12:  2    A    1      D     37   40
13:  2    D    1      C     40   42

I suspect there is a way to pass arguments to foverlaps to avoid needing to define and filter by ostart and oend. As of the latest CRAN version of the package, the doc indicates that minoverlap is not yet implemented, so maybe it is necessary for now.

like image 80
Frank Avatar answered Sep 27 '22 19:09

Frank