Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Test if date occurs in multiple date ranges with R

I have a data frame with multiple date ranges (45 to be exact):

Range  Start       End
1      2014-01-01  2014-02-30
2      2015-01-10  2015-03-30
3      2016-04-20  2016-10-12
...    ...         ...

They will never overlap

I also have a data frame with various event dates (200K+):

Event  Date
1      2014-01-02
2      2014-03-20
3      2015-04-01
4      2016-08-18
...    ...

I want to test if these dates fall within any of these ranges:

Event  Date        InRange
1      2014-01-02  TRUE
2      2014-03-20  FALSE
3      2015-04-01  FALSE
4      2016-08-18  TRUE
...

What is the best way to perform this test? I have looked at lubridate's between and interval functions as well as various Stackoverflow questions, but cannot find a good solution.

like image 832
RUser Avatar asked Jun 21 '16 00:06

RUser


2 Answers

You can create a vector of your date range from the first data frame, then use %in% operator to check if each date of your events is in this date range. Assuming your first data frame is dateRange, and second events, putting the above logic in one line would be:

events$InRange <- events$Date %in% unlist(Map(`:`, dateRange$Start, dateRange$End))

events
  Event       Date InRange
1     1 2014-01-02    TRUE
2     2 2014-03-20   FALSE
3     3 2015-04-01   FALSE
4     4 2016-08-18    TRUE

Where we used the Map to create the date range vector. Map combined with : operator create a list of date range from the Start to the End. Somewhere close to list(2014-01-01 : 2014-02-30, 2015-01-10 : 2015-03-30, 2016-04-20 : 2016-10-12 ...)(symbolically, not valid), with the unlist, we flatten it as a vector of date range which could then be used with %in% conveniently.

like image 121
Psidom Avatar answered Oct 15 '22 05:10

Psidom


Having ordered, non-overlapping intervals in your first "data.frame", you could test -for each event date- if it is above a $Start and its respective $End. Using findInterval to reduce relational comparisons and memory needed.

findInterval(events$Date, ranges$Start) > findInterval(events$Date, ranges$End)
#[1]  TRUE FALSE FALSE  TRUE

With data (modified "2014-02-30"):

ranges = structure(list(Range = 1:3, Start = structure(c(16071, 16445, 
16911), class = "Date"), End = structure(c(16129, 16524, 17086
), class = "Date")), .Names = c("Range", "Start", "End"), row.names = c(NA, 
-3L), class = "data.frame")

events = structure(list(Event = 1:4, Date = structure(c(16072, 16149, 
16526, 17031), class = "Date")), .Names = c("Event", "Date"), row.names = c(NA, 
-4L), class = "data.frame")
like image 31
alexis_laz Avatar answered Oct 15 '22 05:10

alexis_laz