Check if Posixct time is within interval

Question

The problem: I have two dataframes that I would like to merge depending on the date/time of one dataframe being in the interval of the other dataframe.

traffic: Date and Time (Posixct), Frequency

mydata: Interval, Sum of Frequency

I would now like to calculate if the Posixct time from traffic is within the interval of mydata and if this is TRUE I would like to count the frequency in the column "Sum of Frequencies" in mydata.

The two problems, that I encountered: 1. traffic data frame has significantly more rows than mydata. I dont know how to tell R to loop through every observation in traffic to check for one row in mydata.

There can be more than one observation fitting in the frequency interval of mydata. I want R to add up all frequencies of the different traffic observations to get a total score of frequencies. Also the intervals are overlapping.

Here is the data:

DateTime <- c("2014-11-01 04:00:00", "2014-11-01 04:03:00",  "2014-11-01 04:06:00", "2014-11-01 04:08:00", "2014-11-01 04:10:00", "2014-11-01 04:12:00", "2015-08-01 04:13:00", "2015-08-01 04:45:00", "2015-08-01 14:15:00", "2015-08-01 14:13:00")
DateTime <- as.POSIXct(DateTime)
Frequency <- c(1,2,3,5,12,1,2,2,1,1)                 
traffic <- data.frame(DateTime, Frequency)

library(lubridate)
DateTime1 <- c("2014-11-01 04:00:00", "2015-08-01 04:03:00",  "2015-08-01 14:00:00")
DateTime2 <- c("2014-11-01 04:15:00", "2015-08-01 04:13:00",  "2015-08-01 14:15:00")
DateTime1 <- as.POSIXct(DateTime1)
DateTime2 <- as.POSIXct(DateTime2)
mydata <- data.frame(DateTime1, DateTime2)
mydata$Interval <- as.interval(DateTime1, DateTime2)
mydata$SumFrequency <- NA

The expected outcome should be something like this:

mydata$SumFrequency <- c(24, 2, 2)
head(mydata)

I tried int_overlaps from package lubridate. Any tips on how to solve this are higly appreciated!

Jaap · Accepted Answer

A short solution with foverlaps from the data.table package:

mydata <- data.table(DateTime1, DateTime2, key = c("DateTime1", "DateTime2"))
traffic <- data.table(start = DateTime, end = DateTime, Frequency, key = c("start","end"))

foverlaps(traffic, mydata, type="within", nomatch=0L)[, .(sumFreq = sum(Frequency)), 
                                                      by = .(DateTime1, DateTime2)]

which gives:

             DateTime1           DateTime2 sumFreq
1: 2014-11-01 04:00:00 2014-11-01 04:15:00      24
2: 2015-08-01 04:03:00 2015-08-01 04:13:00       2
3: 2015-08-01 14:00:00 2015-08-01 14:15:00       2

Tensibai · Answer

On a data.table approach with between to filter traffic dataset on time:

setDT(traffic)
setDT(mydata)

mydata[,SumFrequency := as.numeric(SumFrequency)] # coerce logical to numeric for next step.
mydata[,SumFrequency := sum( traffic[ DateTime %between% c(DateTime1, DateTime2), Frequency] ), by=1:nrow(mydata)]

which give:

             DateTime1           DateTime2                                           Interval SumFrequency
1: 2014-11-01 04:00:00 2014-11-01 04:15:00   2014-11-01 04:00:00 CET--2014-11-01 04:15:00 CET           24
2: 2015-08-01 04:03:00 2015-08-01 04:13:00 2015-08-01 04:03:00 CEST--2015-08-01 04:13:00 CEST            2
3: 2015-08-01 14:00:00 2015-08-01 14:15:00 2015-08-01 14:00:00 CEST--2015-08-01 14:15:00 CEST            2

If there's a lot of row in mydata, it could be better to create an index column and use it in by clause:

mydata[, idx := .I]
mydata[, SumFrequency := sum( traffic[DateTime %between% c(DateTime1, DateTime2),Frequency] ),by=idx]

And this gives:

             DateTime1           DateTime2                                           Interval SumFrequency idx
1: 2014-11-01 04:00:00 2014-11-01 04:15:00   2014-11-01 04:00:00 CET--2014-11-01 04:15:00 CET           24   1
2: 2015-08-01 04:03:00 2015-08-01 04:13:00 2015-08-01 04:03:00 CEST--2015-08-01 04:13:00 CEST            2   2
3: 2015-08-01 14:00:00 2015-08-01 14:15:00 2015-08-01 14:00:00 CEST--2015-08-01 14:15:00 CEST            2   3

cderv · Answer

I see two solutions :

With `data.frame` and `plyr`

You could do it using %within% function in lubridate and with a for-loop or using plyr loop functions like dlply

DateTime <- c("2014-11-01 04:00:00", "2014-11-01 04:03:00",  "2014-11-01 04:06:00", "2014-11-01 04:08:00", "2014-11-01 04:10:00", "2014-11-01 04:12:00", "2015-08-01 04:13:00", "2015-08-01 04:45:00", "2015-08-01 14:15:00", "2015-08-01 14:13:00")
DateTime <- as.POSIXct(DateTime)
Frequency <- c(1,2,3,5,12,1,2,2,1,1)                 
traffic <- data.frame(DateTime, Frequency)

library(lubridate)
DateTime1 <- c("2014-11-01 04:00:00", "2015-08-01 04:03:00",  "2015-08-01 14:00:00")
DateTime2 <- c("2014-11-01 04:15:00", "2015-08-01 04:13:00",  "2015-08-01 14:15:00")
DateTime1 <- as.POSIXct(DateTime1)
DateTime2 <- as.POSIXct(DateTime2)
mydata <- data.frame(DateTime1, DateTime2)
mydata$Interval <- as.interval(DateTime1, DateTime2)

library(plyr)
# Create a group-by variable
mydata$NumInt <- 1:nrow(mydata)
mydata$SumFrequency <- dlply(mydata, .(NumInt),
                             function(row){
                               sum(
                                 traffic[traffic$DateTime %within% row$Interval, "Frequency"]
                               )
                             })

mydata
#>             DateTime1           DateTime2
#> 1 2014-11-01 04:00:00 2014-11-01 04:15:00
#> 2 2015-08-01 04:03:00 2015-08-01 04:13:00
#> 3 2015-08-01 14:00:00 2015-08-01 14:15:00
#>                                             Interval NumInt SumFrequency
#> 1   2014-11-01 04:00:00 CET--2014-11-01 04:15:00 CET      1           24
#> 2 2015-08-01 04:03:00 CEST--2015-08-01 04:13:00 CEST      2            2
#> 3 2015-08-01 14:00:00 CEST--2015-08-01 14:15:00 CEST      3            2

With `data.table` and functions `foverlaps`

data.table has implemented a function for overlapping joins that you could use in your case with a little trick.
This functions is foverlaps (I uses below data.table 1.9.6)
(see How to perform join over date ranges using data.table? and this presentation)

Notice that you do not need to create interval with lubridate

DateTime <- c("2014-11-01 04:00:00", "2014-11-01 04:03:00",  "2014-11-01 04:06:00", "2014-11-01 04:08:00", "2014-11-01 04:10:00", "2014-11-01 04:12:00", "2015-08-01 04:13:00", "2015-08-01 04:45:00", "2015-08-01 14:15:00", "2015-08-01 14:13:00")
DateTime <- as.POSIXct(DateTime)
Frequency <- c(1,2,3,5,12,1,2,2,1,1)                 
traffic <- data.table(DateTime, Frequency)

library(lubridate)
DateTime1 <- c("2014-11-01 04:00:00", "2015-08-01 04:03:00",  "2015-08-01 14:00:00")
DateTime2 <- c("2014-11-01 04:15:00", "2015-08-01 04:13:00",  "2015-08-01 14:15:00")
mydata <- data.table(DateTime1 = as.POSIXct(DateTime1), DateTime2 = as.POSIXct(DateTime2))

# Use function `foverlaps`  for overlapping joins

# Here's the trick : create a dummy variable to artificially have an interval
traffic[, dummy:=DateTime]
setkey(mydata, DateTime1, DateTime2)

# do the join
mydata2 <- foverlaps(traffic, mydata, by.x=c("DateTime", "dummy"), type ="within", nomatch=0L)[, dummy := NULL][]
mydata2
#>              DateTime1           DateTime2            DateTime Frequency
#> 1: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:00:00         1
#> 2: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:03:00         2
#> 3: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:06:00         3
#> 4: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:08:00         5
#> 5: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:10:00        12
#> 6: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:12:00         1
#> 7: 2015-08-01 04:03:00 2015-08-01 04:13:00 2015-08-01 04:13:00         2
#> 8: 2015-08-01 14:00:00 2015-08-01 14:15:00 2015-08-01 14:15:00         1
#> 9: 2015-08-01 14:00:00 2015-08-01 14:15:00 2015-08-01 14:13:00         1

# summarise with a sum by grouping by each line of mydata
setkeyv(mydata2, key(mydata))
mydata2[mydata, .(SumFrequency = sum(Frequency)), by = .EACHI]
#>              DateTime1           DateTime2 SumFrequency
#> 1: 2014-11-01 04:00:00 2014-11-01 04:15:00           24
#> 2: 2015-08-01 04:03:00 2015-08-01 04:13:00            2
#> 3: 2015-08-01 14:00:00 2015-08-01 14:15:00            2

Etienne Moerman · Answer

As far as point 2 is concerned you can use aggregate for instance

aggData <- aggregate(traffic$Frequency~format(traffic$DateTime, "%Y%m%d h:m"), data=traffic, sum)

This sums all frequencies in minute intervals.

And for point 1. Wouldn't a merge work?

merge(x = myData, y = aggData, by = "DateTime", all.x = TRUE)

The outer merge is explained here

Check if Posixct time is within interval

Tags:

r

count

intervals

lubridate

Anna2803

4 Answers

Jaap

Tensibai

With `data.frame` and `plyr`

With `data.table` and functions `foverlaps`

cderv

Etienne Moerman

Recent Activity

Donate For Us

Check if Posixct time is within interval

Tags:

r

count

intervals

lubridate

Anna2803

4 Answers

Jaap

Tensibai

With data.frame and plyr

With data.table and functions foverlaps

cderv

Etienne Moerman

Related questions

Recent Activity

Donate For Us

With `data.frame` and `plyr`

With `data.table` and functions `foverlaps`