Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to detect a gap in series data using R

Tags:

r

I have a device that inserts periodic (approx every 5 minutes) rows into a status table. Each row is considered a status event and is timestamped. I need to detect when 2 status events happen more than 10 minutes apart.

While I could use a loop solution it doesn't seem very elegant and I am looking for another answer. The table in the database can be simplified to:

12:01:00, status, ok
12:06:31, status, ok
12:12:02, status, ok
13:15:43, status, ok
13,20:33, status, ok

So I want to detect that there was a 1:03:41 gap between the 3rd and 4th status row. Needless to say I have lots of data to deal with.

like image 653
Matthew Doering Avatar asked Nov 04 '25 07:11

Matthew Doering


2 Answers

If you're working with timestamp data in POSIXct format, you can just do simple subtraction to get time difference.

Because R is vectorized, no loop is required -- it's just one vector minus another

Then it's easy to test whether the gap was more than some threshold.

# here's a data frame with a bunch of time stamps
my_dat <- data.frame(time=Sys.time() + sort(runif(10, 100, 600)))

# Take rows 1 to n-1 and subtract rows 2 to n:
my_dat$gap <- c(NA, with(my_dat, time[-1] - time[-nrow(my_dat)]))

# now, how often was the gap more than some amount of time?
gap_threshold <- 30 # let's say, 30 seconds
my_dat$over_thresh <- my_dat$gap > gap_threshold
my_dat

# result: timestamp, difference from prior row in seconds, threshold test result
# > my_dat
#                   time       gap over_thresh
# 1  2015-05-28 16:28:05        NA          NA
# 2  2015-05-28 16:28:46 40.852095        TRUE
# 3  2015-05-28 16:29:35 49.060379        TRUE
# 4  2015-05-28 16:29:55 20.290983       FALSE
# 5  2015-05-28 16:30:02  6.580322       FALSE
# 6  2015-05-28 16:30:34 32.039323        TRUE
# 7  2015-05-28 16:30:58 24.601907       FALSE
# 8  2015-05-28 16:31:16 17.761954       FALSE
# 9  2015-05-28 16:31:51 34.794329        TRUE
# 10 2015-05-28 16:32:35 44.213900        TRUE
like image 128
arvi1000 Avatar answered Nov 05 '25 19:11

arvi1000


With chron "times" class we can compare adjacent times using diff and compare that to 10 minutes:

library(chron)

Times <- times(DF[[1]])
which(c(FALSE, diff(Times) > times("00:10:00")))
## [1] 4

so the 4th point is more than 10 minutes after the prior (3rd) point.

Note: We used this input:

Lines <- "12:01:00, status, ok
12:06:31, status, ok
12:12:02, status, ok
13:15:43, status, ok
13:20:33, status, ok"
DF <- read.table(text = Lines, sep = ",", as.is = TRUE)
like image 21
G. Grothendieck Avatar answered Nov 05 '25 20:11

G. Grothendieck



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!