Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check if Posixct time is within interval

The problem: I have two dataframes that I would like to merge depending on the date/time of one dataframe being in the interval of the other dataframe.

traffic: Date and Time (Posixct), Frequency

mydata: Interval, Sum of Frequency

I would now like to calculate if the Posixct time from traffic is within the interval of mydata and if this is TRUE I would like to count the frequency in the column "Sum of Frequencies" in mydata.

The two problems, that I encountered: 1. traffic data frame has significantly more rows than mydata. I dont know how to tell R to loop through every observation in traffic to check for one row in mydata.

  1. There can be more than one observation fitting in the frequency interval of mydata. I want R to add up all frequencies of the different traffic observations to get a total score of frequencies. Also the intervals are overlapping.

Here is the data:

DateTime <- c("2014-11-01 04:00:00", "2014-11-01 04:03:00",  "2014-11-01 04:06:00", "2014-11-01 04:08:00", "2014-11-01 04:10:00", "2014-11-01 04:12:00", "2015-08-01 04:13:00", "2015-08-01 04:45:00", "2015-08-01 14:15:00", "2015-08-01 14:13:00")
DateTime <- as.POSIXct(DateTime)
Frequency <- c(1,2,3,5,12,1,2,2,1,1)                 
traffic <- data.frame(DateTime, Frequency)

library(lubridate)
DateTime1 <- c("2014-11-01 04:00:00", "2015-08-01 04:03:00",  "2015-08-01 14:00:00")
DateTime2 <- c("2014-11-01 04:15:00", "2015-08-01 04:13:00",  "2015-08-01 14:15:00")
DateTime1 <- as.POSIXct(DateTime1)
DateTime2 <- as.POSIXct(DateTime2)
mydata <- data.frame(DateTime1, DateTime2)
mydata$Interval <- as.interval(DateTime1, DateTime2)
mydata$SumFrequency <- NA

The expected outcome should be something like this:

mydata$SumFrequency <- c(24, 2, 2)
head(mydata)

I tried int_overlaps from package lubridate. Any tips on how to solve this are higly appreciated!

like image 909
Anna2803 Avatar asked Mar 04 '16 08:03

Anna2803


4 Answers

A short solution with foverlaps from the data.table package:

mydata <- data.table(DateTime1, DateTime2, key = c("DateTime1", "DateTime2"))
traffic <- data.table(start = DateTime, end = DateTime, Frequency, key = c("start","end"))

foverlaps(traffic, mydata, type="within", nomatch=0L)[, .(sumFreq = sum(Frequency)), 
                                                      by = .(DateTime1, DateTime2)]

which gives:

             DateTime1           DateTime2 sumFreq
1: 2014-11-01 04:00:00 2014-11-01 04:15:00      24
2: 2015-08-01 04:03:00 2015-08-01 04:13:00       2
3: 2015-08-01 14:00:00 2015-08-01 14:15:00       2
like image 112
Jaap Avatar answered Nov 20 '22 09:11

Jaap


On a data.table approach with between to filter traffic dataset on time:

setDT(traffic)
setDT(mydata)

mydata[,SumFrequency := as.numeric(SumFrequency)] # coerce logical to numeric for next step.
mydata[,SumFrequency := sum( traffic[ DateTime %between% c(DateTime1, DateTime2), Frequency] ), by=1:nrow(mydata)] 

which give:

             DateTime1           DateTime2                                           Interval SumFrequency
1: 2014-11-01 04:00:00 2014-11-01 04:15:00   2014-11-01 04:00:00 CET--2014-11-01 04:15:00 CET           24
2: 2015-08-01 04:03:00 2015-08-01 04:13:00 2015-08-01 04:03:00 CEST--2015-08-01 04:13:00 CEST            2
3: 2015-08-01 14:00:00 2015-08-01 14:15:00 2015-08-01 14:00:00 CEST--2015-08-01 14:15:00 CEST            2

If there's a lot of row in mydata, it could be better to create an index column and use it in by clause:

mydata[, idx := .I]
mydata[, SumFrequency := sum( traffic[DateTime %between% c(DateTime1, DateTime2),Frequency] ),by=idx]

And this gives:

             DateTime1           DateTime2                                           Interval SumFrequency idx
1: 2014-11-01 04:00:00 2014-11-01 04:15:00   2014-11-01 04:00:00 CET--2014-11-01 04:15:00 CET           24   1
2: 2015-08-01 04:03:00 2015-08-01 04:13:00 2015-08-01 04:03:00 CEST--2015-08-01 04:13:00 CEST            2   2
3: 2015-08-01 14:00:00 2015-08-01 14:15:00 2015-08-01 14:00:00 CEST--2015-08-01 14:15:00 CEST            2   3
like image 4
Tensibai Avatar answered Nov 20 '22 11:11

Tensibai


I see two solutions :

With data.frame and plyr

You could do it using %within% function in lubridate and with a for-loop or using plyr loop functions like dlply

DateTime <- c("2014-11-01 04:00:00", "2014-11-01 04:03:00",  "2014-11-01 04:06:00", "2014-11-01 04:08:00", "2014-11-01 04:10:00", "2014-11-01 04:12:00", "2015-08-01 04:13:00", "2015-08-01 04:45:00", "2015-08-01 14:15:00", "2015-08-01 14:13:00")
DateTime <- as.POSIXct(DateTime)
Frequency <- c(1,2,3,5,12,1,2,2,1,1)                 
traffic <- data.frame(DateTime, Frequency)

library(lubridate)
DateTime1 <- c("2014-11-01 04:00:00", "2015-08-01 04:03:00",  "2015-08-01 14:00:00")
DateTime2 <- c("2014-11-01 04:15:00", "2015-08-01 04:13:00",  "2015-08-01 14:15:00")
DateTime1 <- as.POSIXct(DateTime1)
DateTime2 <- as.POSIXct(DateTime2)
mydata <- data.frame(DateTime1, DateTime2)
mydata$Interval <- as.interval(DateTime1, DateTime2)

library(plyr)
# Create a group-by variable
mydata$NumInt <- 1:nrow(mydata)
mydata$SumFrequency <- dlply(mydata, .(NumInt),
                             function(row){
                               sum(
                                 traffic[traffic$DateTime %within% row$Interval, "Frequency"]
                               )
                             })

mydata
#>             DateTime1           DateTime2
#> 1 2014-11-01 04:00:00 2014-11-01 04:15:00
#> 2 2015-08-01 04:03:00 2015-08-01 04:13:00
#> 3 2015-08-01 14:00:00 2015-08-01 14:15:00
#>                                             Interval NumInt SumFrequency
#> 1   2014-11-01 04:00:00 CET--2014-11-01 04:15:00 CET      1           24
#> 2 2015-08-01 04:03:00 CEST--2015-08-01 04:13:00 CEST      2            2
#> 3 2015-08-01 14:00:00 CEST--2015-08-01 14:15:00 CEST      3            2

With data.table and functions foverlaps

data.table has implemented a function for overlapping joins that you could use in your case with a little trick.
This functions is foverlaps (I uses below data.table 1.9.6)
(see How to perform join over date ranges using data.table? and this presentation)

Notice that you do not need to create interval with lubridate

DateTime <- c("2014-11-01 04:00:00", "2014-11-01 04:03:00",  "2014-11-01 04:06:00", "2014-11-01 04:08:00", "2014-11-01 04:10:00", "2014-11-01 04:12:00", "2015-08-01 04:13:00", "2015-08-01 04:45:00", "2015-08-01 14:15:00", "2015-08-01 14:13:00")
DateTime <- as.POSIXct(DateTime)
Frequency <- c(1,2,3,5,12,1,2,2,1,1)                 
traffic <- data.table(DateTime, Frequency)

library(lubridate)
DateTime1 <- c("2014-11-01 04:00:00", "2015-08-01 04:03:00",  "2015-08-01 14:00:00")
DateTime2 <- c("2014-11-01 04:15:00", "2015-08-01 04:13:00",  "2015-08-01 14:15:00")
mydata <- data.table(DateTime1 = as.POSIXct(DateTime1), DateTime2 = as.POSIXct(DateTime2))

# Use function `foverlaps`  for overlapping joins

# Here's the trick : create a dummy variable to artificially have an interval
traffic[, dummy:=DateTime]
setkey(mydata, DateTime1, DateTime2)

# do the join
mydata2 <- foverlaps(traffic, mydata, by.x=c("DateTime", "dummy"), type ="within", nomatch=0L)[, dummy := NULL][]
mydata2
#>              DateTime1           DateTime2            DateTime Frequency
#> 1: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:00:00         1
#> 2: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:03:00         2
#> 3: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:06:00         3
#> 4: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:08:00         5
#> 5: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:10:00        12
#> 6: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:12:00         1
#> 7: 2015-08-01 04:03:00 2015-08-01 04:13:00 2015-08-01 04:13:00         2
#> 8: 2015-08-01 14:00:00 2015-08-01 14:15:00 2015-08-01 14:15:00         1
#> 9: 2015-08-01 14:00:00 2015-08-01 14:15:00 2015-08-01 14:13:00         1

# summarise with a sum by grouping by each line of mydata
setkeyv(mydata2, key(mydata))
mydata2[mydata, .(SumFrequency = sum(Frequency)), by = .EACHI]
#>              DateTime1           DateTime2 SumFrequency
#> 1: 2014-11-01 04:00:00 2014-11-01 04:15:00           24
#> 2: 2015-08-01 04:03:00 2015-08-01 04:13:00            2
#> 3: 2015-08-01 14:00:00 2015-08-01 14:15:00            2
like image 3
cderv Avatar answered Nov 20 '22 10:11

cderv


As far as point 2 is concerned you can use aggregate for instance

aggData <- aggregate(traffic$Frequency~format(traffic$DateTime, "%Y%m%d h:m"), data=traffic, sum)

This sums all frequencies in minute intervals.

And for point 1. Wouldn't a merge work?

merge(x = myData, y = aggData, by = "DateTime", all.x = TRUE)  

The outer merge is explained here

like image 1
Etienne Moerman Avatar answered Nov 20 '22 09:11

Etienne Moerman