Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select rows within a particular time range

Tags:

dataframe

r

I have a data frame like:

TimeStamp                    Category

2013-11-02 07:57:18 AM         0
2013-11-02 08:07:19 AM         0
2013-11-02 08:07:21 AM         0
2013-11-02 08:07:25 AM         1
2013-11-02 08:07:29 AM         0
2013-11-02 08:08:18 AM         0
2013-11-02 08:09:20 AM         0
2013-11-02 09:04:18 AM         0
2013-11-02 09:05:22 AM         0
2013-11-02 09:07:18 AM         0

What I want to do is to select the +-10 minute time frames when Category is "1".

For this case, because category = 1 is at 2013-11-02 08:07:25 AM, I want to select all rows within 07:57:25 AM to 08:17:25 AM.

What is the best way to handle this task?

addition, there maybe multiple "1" for each time frame. (the real data frame is more complicate, it contains multiple TimeStamp with different users, i.e. there is another column named "UserID")

like image 360
zxwjames Avatar asked Jun 24 '15 22:06

zxwjames


4 Answers

In base R, without lubridate-ing or anything else (assuming that you're going to convert TimeStamp to a POSIXct object), like:

df$TimeStamp <- as.POSIXct(TimeStamp, format = "%Y-%m-%d %I:%M:%S %p")
df[with(df, abs(difftime(TimeStamp[Category==1],TimeStamp,units="mins")) <= 10 ),]

#            TimeStamp Category
#2 2013-11-02 08:07:19        0
#3 2013-11-02 08:07:21        0
#4 2013-11-02 08:07:25        1
#5 2013-11-02 08:07:29        0
#6 2013-11-02 08:08:18        0
#7 2013-11-02 08:09:20        0

If you've got multiple 1's, you'd have to loop over it like:

check <- with(df, 
  lapply(TimeStamp[Category==1], function(x) abs(difftime(x,TimeStamp,units="mins")) <= 10 ) 
)
df[do.call(pmax, check)==1,]
like image 141
thelatemail Avatar answered Nov 20 '22 02:11

thelatemail


Here's how I would approach this using data.table::foverlaps

First, convert TimeStamp to a proper POSIXct

library(data.table)
setDT(df)[, TimeStamp := as.POSIXct(TimeStamp, format = "%Y-%m-%d %I:%M:%S %p")]

Then we will create a temporary data set where Category == 1 to join against. We will also create an "end" column and key by both "start" and "end" columns

df2 <- setkey(df[Category == 1L][, TimeStamp2 := TimeStamp], TimeStamp, TimeStamp2)

Then, we will do the same for df but will set 10 minutes intervals

setkey(df[, `:=`(start = TimeStamp - 600, end = TimeStamp + 600)], start, end)

Then, all is left to do is to run foverlaps and subset by matched incidences

indx <- foverlaps(df, df2, which = TRUE, nomatch = 0L)$xid
df[indx, .(TimeStamp,  Category)]
#              TimeStamp Category
# 1: 2013-11-02 08:07:19        0
# 2: 2013-11-02 08:07:21        0
# 3: 2013-11-02 08:07:25        1
# 4: 2013-11-02 08:07:29        0
# 5: 2013-11-02 08:08:18        0
# 6: 2013-11-02 08:09:20        0
like image 38
David Arenburg Avatar answered Nov 20 '22 03:11

David Arenburg


Using lubridate:

df$TimeStamp <- ymd_hms(df$TimeStamp)
span10 <- (df$TimeStamp[df$Category == 1] - minutes(10)) %--% (df$TimeStamp[df$Category == 1] + minutes(10))
df[df$TimeStamp %within% span10,]
            TimeStamp Category
2 2013-11-02 08:07:19        0
3 2013-11-02 08:07:21        0
4 2013-11-02 08:07:25        1
5 2013-11-02 08:07:29        0
6 2013-11-02 08:08:18        0
7 2013-11-02 08:09:20        0
like image 4
Pierre L Avatar answered Nov 20 '22 03:11

Pierre L


This seems to work:

Data:

As per @DavidArenburg 's comment (and as mentioned in his answer) the right way to convert the timestamp column into a POSIXct object is (if it not already):

df$TimeStamp <- as.POSIXct(df$TimeStamp, format = "%Y-%m-%d %I:%M:%S %p")

Solution:

library(lubridate) #for minutes
library(dplyr)     #for between
pickrows <- function(df) {
  #pick category == 1 rows
  df2 <- df[df$Category==1,]
  #for each timestamp create two variables start and end
  #for +10 and -10 minutes
  #then pick rows between them
  lapply(df2$TimeStamp, function(time) {
      start <- time - minutes(10)
      end   <- time + minutes(10)
      df[between(df$TimeStamp, start, end),]
  })
} 

#run function
pickrows(df)

Output:

> pickrows(df)
[[1]]
            TimeStamp Category
2 2013-11-02 08:07:19        0
3 2013-11-02 08:07:21        0
4 2013-11-02 08:07:25        1
5 2013-11-02 08:07:29        0
6 2013-11-02 08:08:18        0
7 2013-11-02 08:09:20        0

Keep in mind that the output in case of multiple Category==1 rows, my function's output will be a list (in this ocassion it only has one element) so a do.call(rbind, pickrows(df)) will be needed to combine everything in one data.frame.

like image 4
LyzandeR Avatar answered Nov 20 '22 03:11

LyzandeR