I have a data frame like:
TimeStamp Category
2013-11-02 07:57:18 AM 0
2013-11-02 08:07:19 AM 0
2013-11-02 08:07:21 AM 0
2013-11-02 08:07:25 AM 1
2013-11-02 08:07:29 AM 0
2013-11-02 08:08:18 AM 0
2013-11-02 08:09:20 AM 0
2013-11-02 09:04:18 AM 0
2013-11-02 09:05:22 AM 0
2013-11-02 09:07:18 AM 0
What I want to do is to select the +-10 minute time frames when Category
is "1".
For this case, because category = 1
is at 2013-11-02 08:07:25 AM
, I want to select all rows within 07:57:25 AM to 08:17:25 AM
.
What is the best way to handle this task?
addition, there maybe multiple "1" for each time frame. (the real data frame is more complicate, it contains multiple TimeStamp with different users, i.e. there is another column named "UserID")
In base R, without lubridate-ing or anything else (assuming that you're going to convert TimeStamp to a POSIXct
object), like:
df$TimeStamp <- as.POSIXct(TimeStamp, format = "%Y-%m-%d %I:%M:%S %p")
df[with(df, abs(difftime(TimeStamp[Category==1],TimeStamp,units="mins")) <= 10 ),]
# TimeStamp Category
#2 2013-11-02 08:07:19 0
#3 2013-11-02 08:07:21 0
#4 2013-11-02 08:07:25 1
#5 2013-11-02 08:07:29 0
#6 2013-11-02 08:08:18 0
#7 2013-11-02 08:09:20 0
If you've got multiple 1
's, you'd have to loop over it like:
check <- with(df,
lapply(TimeStamp[Category==1], function(x) abs(difftime(x,TimeStamp,units="mins")) <= 10 )
)
df[do.call(pmax, check)==1,]
Here's how I would approach this using data.table::foverlaps
First, convert TimeStamp
to a proper POSIXct
library(data.table)
setDT(df)[, TimeStamp := as.POSIXct(TimeStamp, format = "%Y-%m-%d %I:%M:%S %p")]
Then we will create a temporary data set where Category == 1
to join against. We will also create an "end" column and key
by both "start" and "end" columns
df2 <- setkey(df[Category == 1L][, TimeStamp2 := TimeStamp], TimeStamp, TimeStamp2)
Then, we will do the same for df
but will set 10 minutes intervals
setkey(df[, `:=`(start = TimeStamp - 600, end = TimeStamp + 600)], start, end)
Then, all is left to do is to run foverlaps
and subset by matched incidences
indx <- foverlaps(df, df2, which = TRUE, nomatch = 0L)$xid
df[indx, .(TimeStamp, Category)]
# TimeStamp Category
# 1: 2013-11-02 08:07:19 0
# 2: 2013-11-02 08:07:21 0
# 3: 2013-11-02 08:07:25 1
# 4: 2013-11-02 08:07:29 0
# 5: 2013-11-02 08:08:18 0
# 6: 2013-11-02 08:09:20 0
Using lubridate:
df$TimeStamp <- ymd_hms(df$TimeStamp)
span10 <- (df$TimeStamp[df$Category == 1] - minutes(10)) %--% (df$TimeStamp[df$Category == 1] + minutes(10))
df[df$TimeStamp %within% span10,]
TimeStamp Category
2 2013-11-02 08:07:19 0
3 2013-11-02 08:07:21 0
4 2013-11-02 08:07:25 1
5 2013-11-02 08:07:29 0
6 2013-11-02 08:08:18 0
7 2013-11-02 08:09:20 0
This seems to work:
Data:
As per @DavidArenburg 's comment (and as mentioned in his answer) the right way to convert the timestamp column into a POSIXct
object is (if it not already):
df$TimeStamp <- as.POSIXct(df$TimeStamp, format = "%Y-%m-%d %I:%M:%S %p")
Solution:
library(lubridate) #for minutes
library(dplyr) #for between
pickrows <- function(df) {
#pick category == 1 rows
df2 <- df[df$Category==1,]
#for each timestamp create two variables start and end
#for +10 and -10 minutes
#then pick rows between them
lapply(df2$TimeStamp, function(time) {
start <- time - minutes(10)
end <- time + minutes(10)
df[between(df$TimeStamp, start, end),]
})
}
#run function
pickrows(df)
Output:
> pickrows(df)
[[1]]
TimeStamp Category
2 2013-11-02 08:07:19 0
3 2013-11-02 08:07:21 0
4 2013-11-02 08:07:25 1
5 2013-11-02 08:07:29 0
6 2013-11-02 08:08:18 0
7 2013-11-02 08:09:20 0
Keep in mind that the output in case of multiple Category==1
rows, my function's output will be a list (in this ocassion it only has one element) so a do.call(rbind, pickrows(df))
will be needed to combine everything in one data.frame.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With