Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lubridate - Find overlap time between interval and a date

Tags:

r

lubridate

I have a dataframe with shift start and end in datetime format like so:

shift_time <- data.frame(
  started_at = c("2019-09-01 02:00:00 AEST", "2019-09-02 05:00:00 AEST", "2019-11-04 20:00:00 AEDT"),
  ended_at = c("2019-09-01 11:30:00 AEST", "2019-09-02 19:00:00 AEST", "2019-11-05 04:00:00 AEDT")
)

I have another dataframe with public holiday dates like so:

public_holidays <- data.frame(
  hol_name = c('Cup Day', 'Christmas'),
  date = c("2019-11-05", "2019-12-25")
)

I want to update the shift_time df with a new column recording the number of hours of the shift that took place on a public holiday - i.e., I want to calculate the overlap (in hours) between the shift interval and any public holiday that applies. In the above example, the expected values for the new variable would be 0, 0, 4.

Is there a way to do this that does not involve creating a lot of new variables (eg, difftimes, intervals, matched dates)?

like image 600
userLL Avatar asked Oct 23 '19 06:10

userLL


1 Answers

There is the built-in lubridate::int_overlaps but that only returns a logical, not how long they overlap. Luckily, the intersection function has a method for Interval objects. The only trick is that if there's no overlap, it returns length-NA, not length-0. So we can wrap up that logic like this:

library(lubridate)

int_overlaps_numeric <- function (int1, int2) {
  stopifnot(c(is.interval(int1), is.interval(int2)))

  x <- intersect(int1, int2)@.Data
  x[is.na(x)] <- 0
  as.duration(x)
}

This constructs the interval that is the overlap, then extracts the length of it (in seconds). If it's NA, change it to zero, and return. as.duration just gives us pretty-printing. Now you just have to give it two intervals:

int1 <- as.interval(5, Sys.time())
int2 <- as.interval(5, Sys.time()+3)

int_overlaps_numeric(int1, int2)
"1.99299597740173s"

So you need to get all your holidays into intervals, and all your shifts into intervals. Presumably you want to associate these overlaps with other data in the shift_time dataframe, so we'll use dplyr to do all our work inside there. However, you want to check each shift against a vector of all holidays, so we should add another helper function (using purrr::map).

library(dplyr)
library(purrr)

check_shift_against_holidays <- function(shift, holidays) {
  map(shift, ~sum(int_overlaps_numeric(.x, holidays))) %>% 
    unlist() %>% 
    as.duration()
}

This function takes two vectors of intervals. For each element of the first vector, it counts overlaps with every element of the second vector, then adds them up. Then turn it from a list back into a vector, and reclass it as a duration for pretty-printing. The caveat here is that if there are any overlaps in the holidays vector, those hours will be double-counted.

                               # days(1) since the holiday lasts all day
holiday_intervals <- as.interval(days(1), ymd(public_holidays$date))

shift_time %>% 
  mutate(
    shift = interval(ymd_hms(started_at), ymd_hms(ended_at)),
    holiday_hours = check_shift_against_holidays(shift, holiday_intervals)
  )
                started_at                 ended_at                                            shift     holiday_hours
1 2019-09-01 02:00:00 AEST 2019-09-01 11:30:00 AEST 2019-09-01 02:00:00 UTC--2019-09-01 11:30:00 UTC                0s
2 2019-09-02 05:00:00 AEST 2019-09-02 19:00:00 AEST 2019-09-02 05:00:00 UTC--2019-09-02 19:00:00 UTC                0s
3 2019-11-04 20:00:00 AEDT 2019-11-05 04:00:00 AEDT 2019-11-04 20:00:00 UTC--2019-11-05 04:00:00 UTC 14400s (~4 hours)

And if you're really opposed to creating any new intermediate variables:

shift_time %>% 
  mutate(
    holiday_hours = check_shift_against_holidays(
      ymd_hms(started_at) %--% ymd_hms(ended_at), 
      holiday_intervals
      )
  )
                started_at                 ended_at     holiday_hours
1 2019-09-01 02:00:00 AEST 2019-09-01 11:30:00 AEST                0s
2 2019-09-02 05:00:00 AEST 2019-09-02 19:00:00 AEST                0s
3 2019-11-04 20:00:00 AEDT 2019-11-05 04:00:00 AEDT 14400s (~4 hours)
like image 166
Brian Avatar answered Nov 07 '22 08:11

Brian