Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate the number of overlaying date intervals per group

I have the following dataframe df (dput below):

> df
   group       from         to
1      A 2023-03-01 2023-03-02
2      A 2023-03-01 2023-03-03
3      A 2023-03-03 2023-03-07
4      A 2023-03-05 2023-03-08
5      A 2023-03-09 2023-03-10
6      A 2023-03-11 2023-03-11
7      B 2023-03-01 2023-03-02
8      B 2023-03-04 2023-03-06
9      B 2023-03-07 2023-03-07
10     B 2023-03-08 2023-03-11
11     B 2023-03-10 2023-03-12
12     B 2023-03-15 2023-03-16

I would like to calculate the number of overlaying date intervals per group based on from and to columns. In group A, row 1 and 2 overlay, row 3 overlays with row 2 and 4, so this means group A has a total of 3 overlaying intervals. In group B only row 10 and 11 overlays. So I would like to have the following output:

  group overlaying_intervals
1     A                    3
2     B                    1

So I was wondering if anyone knows how to calculate the number of overlaying date intervals per group?


dput df:

df <- structure(list(group = c("A", "A", "A", "A", "A", "A", "B", "B", 
"B", "B", "B", "B"), from = c("2023-03-01", "2023-03-01", "2023-03-03", 
"2023-03-05", "2023-03-09", "2023-03-11", "2023-03-01", "2023-03-04", 
"2023-03-07", "2023-03-08", "2023-03-10", "2023-03-15"), to = c("2023-03-02", 
"2023-03-03", "2023-03-07", "2023-03-08", "2023-03-10", "2023-03-11", 
"2023-03-02", "2023-03-06", "2023-03-07", "2023-03-11", "2023-03-12", 
"2023-03-16")), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))
like image 776
Quinten Avatar asked Dec 13 '25 11:12

Quinten


1 Answers

It feels like there should be a more elegant way of achieving this, but my first inclination was to count all overlapping intervals and then account for overlapping with self and double counting every pairwise overlap.

library(lubridate)
library(dplyr)
library(purrr)


df %>%
  group_by(group) %>%
  mutate(int = interval(from, to),
         # count overlapping intervals, subtracting overlap with self
         overlays = (map_int(int, ~sum(int_overlaps(.x, int))))-1) %>%
  # divide total by 2 since each pairwise overlap is counted twice
  summarize(overlaying_intervals = sum(overlays)/2)
#> # A tibble: 2 × 2
#>   group overlaying_intervals
#>   <chr>                <dbl>
#> 1 A                        3
#> 2 B                        1

Created on 2023-03-31 with reprex v2.0.2

like image 153
Seth Avatar answered Dec 15 '25 19:12

Seth



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!