I have a dataframe with shift start and end in datetime format like so:
shift_time <- data.frame(
started_at = c("2019-09-01 02:00:00 AEST", "2019-09-02 05:00:00 AEST", "2019-11-04 20:00:00 AEDT"),
ended_at = c("2019-09-01 11:30:00 AEST", "2019-09-02 19:00:00 AEST", "2019-11-05 04:00:00 AEDT")
)
I have another dataframe with public holiday dates like so:
public_holidays <- data.frame(
hol_name = c('Cup Day', 'Christmas'),
date = c("2019-11-05", "2019-12-25")
)
I want to update the shift_time df with a new column recording the number of hours of the shift that took place on a public holiday - i.e., I want to calculate the overlap (in hours) between the shift interval and any public holiday that applies. In the above example, the expected values for the new variable would be 0, 0, 4.
Is there a way to do this that does not involve creating a lot of new variables (eg, difftimes, intervals, matched dates)?
There is the built-in lubridate::int_overlaps
but that only returns a logical, not how long they overlap. Luckily, the intersection
function has a method for Interval
objects. The only trick is that if there's no overlap, it returns length-NA
, not length-0
. So we can wrap up that logic like this:
library(lubridate)
int_overlaps_numeric <- function (int1, int2) {
stopifnot(c(is.interval(int1), is.interval(int2)))
x <- intersect(int1, int2)@.Data
x[is.na(x)] <- 0
as.duration(x)
}
This constructs the interval that is the overlap, then extracts the length of it (in seconds). If it's NA
, change it to zero, and return. as.duration
just gives us pretty-printing. Now you just have to give it two intervals:
int1 <- as.interval(5, Sys.time())
int2 <- as.interval(5, Sys.time()+3)
int_overlaps_numeric(int1, int2)
"1.99299597740173s"
So you need to get all your holidays into intervals, and all your shifts into intervals. Presumably you want to associate these overlaps with other data in the shift_time
dataframe, so we'll use dplyr
to do all our work inside there. However, you want to check each shift against a vector of all holidays, so we should add another helper function (using purrr::map
).
library(dplyr)
library(purrr)
check_shift_against_holidays <- function(shift, holidays) {
map(shift, ~sum(int_overlaps_numeric(.x, holidays))) %>%
unlist() %>%
as.duration()
}
This function takes two vectors of intervals. For each element of the first vector, it counts overlaps with every element of the second vector, then adds them up. Then turn it from a list back into a vector, and reclass it as a duration
for pretty-printing. The caveat here is that if there are any overlaps in the holidays
vector, those hours will be double-counted.
# days(1) since the holiday lasts all day
holiday_intervals <- as.interval(days(1), ymd(public_holidays$date))
shift_time %>%
mutate(
shift = interval(ymd_hms(started_at), ymd_hms(ended_at)),
holiday_hours = check_shift_against_holidays(shift, holiday_intervals)
)
started_at ended_at shift holiday_hours 1 2019-09-01 02:00:00 AEST 2019-09-01 11:30:00 AEST 2019-09-01 02:00:00 UTC--2019-09-01 11:30:00 UTC 0s 2 2019-09-02 05:00:00 AEST 2019-09-02 19:00:00 AEST 2019-09-02 05:00:00 UTC--2019-09-02 19:00:00 UTC 0s 3 2019-11-04 20:00:00 AEDT 2019-11-05 04:00:00 AEDT 2019-11-04 20:00:00 UTC--2019-11-05 04:00:00 UTC 14400s (~4 hours)
And if you're really opposed to creating any new intermediate variables:
shift_time %>%
mutate(
holiday_hours = check_shift_against_holidays(
ymd_hms(started_at) %--% ymd_hms(ended_at),
holiday_intervals
)
)
started_at ended_at holiday_hours 1 2019-09-01 02:00:00 AEST 2019-09-01 11:30:00 AEST 0s 2 2019-09-02 05:00:00 AEST 2019-09-02 19:00:00 AEST 0s 3 2019-11-04 20:00:00 AEDT 2019-11-05 04:00:00 AEDT 14400s (~4 hours)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With