Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R group variables in days calculated from interval between start and end time

I have a dataframe as below

tmpdf <- data.frame(licensePlate = c("Y80901", "Y80901", "Y80901", "AMG-999", "AMG-999", "W3188", "W3188"),  
starttime= c("2015-09-18 09:55", "2015-09-18 23:00", "2015-09-20 15:00", "2015-09-17 15:42", "2015-09-21 09:22", "2015-09-17 09:00", "2015-09-21 14:00"),
endtime = c("2015-09-18 17:55", "2015-09-20 11:00", "2015-09-21 12:00",  "2015-09-18 13:00",  "2015-09-21 14:22", "2015-09-21 12:00", "2015-09-21 16:00"))
    tmpdf
      licensePlate        starttime          endtime
    1       Y80901 2015-09-18 09:55 2015-09-18 17:55
    2       Y80901 2015-09-18 23:00 2015-09-20 11:00
    3       Y80901 2015-09-20 15:00 2015-09-21 12:00
    4      AMG-999 2015-09-17 15:42 2015-09-18 13:00
    5      AMG-999 2015-09-21 09:22 2015-09-21 14:22
    6        W3188 2015-09-17 09:00 2015-09-21 12:00
    7        W3188 2015-09-21 14:00 2015-09-21 16:00

I want to count the last n days (for example, last 5 days from 9/17 to 9/21) in time used per day of each licensePlate, My expected result is as below:

   Period            LicensePlate        Used Time   

1 2015-09-17         Y80901              0
2 2015-09-17         AMG-999             8.3     
3 2015-09-17         W3188               15
4 2015-09-18         Y80901              9
5 2015-09-18         AMG-999             13
6 2015-09-18         W3188               24
7 2015-09-19         Y80901              24
8 2015-09-19         AMG-999             0
9 2015-09-19         W3188               24
10 2015-09-20        Y80901              20
11 2015-09-20        AMG-999             0
12 2015-09-20        W3188               24
13 2015-09-21        Y80901              12
14 2015-09-21        AMG-999             5
15 2015-09-21        W3188               14

I think dplyr/data.table and lubridate can be used to obtain my result, and I may need to measure time period in days, but I don't know how to cut within a start/end interval when start/end will differ per row.

like image 644
James Chien Avatar asked Sep 26 '22 16:09

James Chien


1 Answers

Here's something to get you started. This is almost your desired output, as it doesn't show you the missing licensePlate per period.

The first step would be to convert your date to valid POSIXct class, then expand the data to a per minute level (probably the most costly part in this solution) and the aggregate by licensePlate and by Period while summing up the results (I'm not using as.Date here because it handles badly POSIX values with values between 00 and 1 am).

library(data.table)
setDT(tmpdf)[, `:=`(starttime = as.POSIXct(starttime), endtime = as.POSIXct(endtime))]
res <- tmpdf[, .(licensePlate, Period = seq(starttime, endtime, by = "1 min")), by = 1:nrow(tmpdf)]
res[, .(Used_Time = round(.N/60L, 1L)), keyby = .(Period = substr(Period, 1L, 10L), licensePlate)]
#         Period licensePlate Used_Time
#  1: 2015-09-17      AMG-999       8.3
#  2: 2015-09-17        W3188      15.0
#  3: 2015-09-18      AMG-999      13.0
#  4: 2015-09-18        W3188      24.0
#  5: 2015-09-18       Y80901       9.0
#  6: 2015-09-19        W3188      24.0
#  7: 2015-09-19       Y80901      24.0
#  8: 2015-09-20        W3188      24.0
#  9: 2015-09-20       Y80901      20.0
# 10: 2015-09-21      AMG-999       5.0
# 11: 2015-09-21        W3188      14.0
# 12: 2015-09-21       Y80901      12.0
like image 115
David Arenburg Avatar answered Sep 29 '22 05:09

David Arenburg