Select rows within a particular time range

Question

I have a data frame like:

TimeStamp                    Category

2013-11-02 07:57:18 AM         0
2013-11-02 08:07:19 AM         0
2013-11-02 08:07:21 AM         0
2013-11-02 08:07:25 AM         1
2013-11-02 08:07:29 AM         0
2013-11-02 08:08:18 AM         0
2013-11-02 08:09:20 AM         0
2013-11-02 09:04:18 AM         0
2013-11-02 09:05:22 AM         0
2013-11-02 09:07:18 AM         0

What I want to do is to select the +-10 minute time frames when Category is "1".

For this case, because category = 1 is at 2013-11-02 08:07:25 AM, I want to select all rows within 07:57:25 AM to 08:17:25 AM.

What is the best way to handle this task?

addition, there maybe multiple "1" for each time frame. (the real data frame is more complicate, it contains multiple TimeStamp with different users, i.e. there is another column named "UserID")

thelatemail · Accepted Answer

In base R, without lubridate-ing or anything else (assuming that you're going to convert TimeStamp to a POSIXct object), like:

df$TimeStamp <- as.POSIXct(TimeStamp, format = "%Y-%m-%d %I:%M:%S %p")
df[with(df, abs(difftime(TimeStamp[Category==1],TimeStamp,units="mins")) <= 10 ),]

#            TimeStamp Category
#2 2013-11-02 08:07:19        0
#3 2013-11-02 08:07:21        0
#4 2013-11-02 08:07:25        1
#5 2013-11-02 08:07:29        0
#6 2013-11-02 08:08:18        0
#7 2013-11-02 08:09:20        0

If you've got multiple 1's, you'd have to loop over it like:

check <- with(df, 
  lapply(TimeStamp[Category==1], function(x) abs(difftime(x,TimeStamp,units="mins")) <= 10 ) 
)
df[do.call(pmax, check)==1,]

David Arenburg · Answer

Here's how I would approach this using data.table::foverlaps

First, convert TimeStamp to a proper POSIXct

library(data.table)
setDT(df)[, TimeStamp := as.POSIXct(TimeStamp, format = "%Y-%m-%d %I:%M:%S %p")]

Then we will create a temporary data set where Category == 1 to join against. We will also create an "end" column and key by both "start" and "end" columns

df2 <- setkey(df[Category == 1L][, TimeStamp2 := TimeStamp], TimeStamp, TimeStamp2)

Then, we will do the same for df but will set 10 minutes intervals

setkey(df[, `:=`(start = TimeStamp - 600, end = TimeStamp + 600)], start, end)

Then, all is left to do is to run foverlaps and subset by matched incidences

indx <- foverlaps(df, df2, which = TRUE, nomatch = 0L)$xid
df[indx, .(TimeStamp,  Category)]
#              TimeStamp Category
# 1: 2013-11-02 08:07:19        0
# 2: 2013-11-02 08:07:21        0
# 3: 2013-11-02 08:07:25        1
# 4: 2013-11-02 08:07:29        0
# 5: 2013-11-02 08:08:18        0
# 6: 2013-11-02 08:09:20        0

Pierre L · Answer

Using lubridate:

df$TimeStamp <- ymd_hms(df$TimeStamp)
span10 <- (df$TimeStamp[df$Category == 1] - minutes(10)) %--% (df$TimeStamp[df$Category == 1] + minutes(10))
df[df$TimeStamp %within% span10,]
            TimeStamp Category
2 2013-11-02 08:07:19        0
3 2013-11-02 08:07:21        0
4 2013-11-02 08:07:25        1
5 2013-11-02 08:07:29        0
6 2013-11-02 08:08:18        0
7 2013-11-02 08:09:20        0

LyzandeR · Answer

This seems to work:

Data:

As per @DavidArenburg 's comment (and as mentioned in his answer) the right way to convert the timestamp column into a POSIXct object is (if it not already):

df$TimeStamp <- as.POSIXct(df$TimeStamp, format = "%Y-%m-%d %I:%M:%S %p")

Solution:

library(lubridate) #for minutes
library(dplyr)     #for between
pickrows <- function(df) {
  #pick category == 1 rows
  df2 <- df[df$Category==1,]
  #for each timestamp create two variables start and end
  #for +10 and -10 minutes
  #then pick rows between them
  lapply(df2$TimeStamp, function(time) {
      start <- time - minutes(10)
      end   <- time + minutes(10)
      df[between(df$TimeStamp, start, end),]
  })
} 

#run function
pickrows(df)

Output:

> pickrows(df)
[[1]]
            TimeStamp Category
2 2013-11-02 08:07:19        0
3 2013-11-02 08:07:21        0
4 2013-11-02 08:07:25        1
5 2013-11-02 08:07:29        0
6 2013-11-02 08:08:18        0
7 2013-11-02 08:09:20        0

Keep in mind that the output in case of multiple Category==1 rows, my function's output will be a list (in this ocassion it only has one element) so a do.call(rbind, pickrows(df)) will be needed to combine everything in one data.frame.

Select rows within a particular time range

Tags:

dataframe

r

zxwjames

4 Answers

thelatemail

David Arenburg

Pierre L

LyzandeR

Recent Activity

Donate For Us

Select rows within a particular time range

Tags:

dataframe

r

zxwjames

4 Answers

thelatemail

David Arenburg

Pierre L

LyzandeR

Related questions

Recent Activity

Donate For Us