Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Location and value for consecutive values above threshold

Tags:

r

I need to find where my data are reaching a threshold for consecutive days. I'm looking for 4 consecutive observations above the threshold. I want to return the location of the first observation of the series that meets these criteria.

Here is an example data set:

eg = structure(list(t.date = structure(c(1L, 2L, 11L, 12L, 13L, 14L, 
15L, 16L, 17L, 18L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), .Label = c("4/30/11", 
"5/1/11", "5/10/11", "5/11/11", "5/12/11", "5/13/11", "5/14/11", 
"5/15/11", "5/16/11", "5/17/11", "5/2/11", "5/3/11", "5/4/11", 
"5/5/11", "5/6/11", "5/7/11", "5/8/11", "5/9/11"), class = "factor"), 
t.avg = c(4L, 4L, 5L, 6L, 10L, 18L, 18L, 18L, 18L, 12L, 10L, 
10L, 8L, 8L, 9L, 10L, 6L, 5L)), .Names = c("date", "avg"
), row.names = c(NA, -18L), class = "data.frame")

I want the date where avg meets the criteria (avg >17 for 4 days) One approach:

eg$date %in% eg$date[which(eg$avg > 17)]
# [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE
# [13] FALSE FALSE FALSE FALSE FALSE FALSE

In this case I could take the first case of TRUE as the answer but this would not work if the second, third or fourth were not TRUE

I need the first date where the condition is TRUE:

eg$date[which(eg$avg > 17)]
# [1] 5/5/11 5/6/11 5/7/11 5/8/11

And the location of the first observation in the series:

which(eg$avg > 17)
# [1] 6 7 8 9

I have found related questions but I have not been able to bend the methods to my needs.

Many thanks.

like image 779
frostygoat Avatar asked Nov 21 '14 23:11

frostygoat


2 Answers

library(zoo)
#  Get the index value
xx <- which(rollapply(eg$avg,4, function(x) min(x))>17)[1]
# Get the date
eg$date[xx]
like image 93
Jordan Avatar answered Nov 15 '22 10:11

Jordan


Use run length encoding (rle)

> rle(eg$avg > 17)
Run Length Encoding
  lengths: int [1:3] 5 4 9
  values : logi [1:3] FALSE TRUE FALSE

rleg <- rle(eg$avg > 17)
rleg$lengths[!rleg$values][1]  # returns so add one to it 
#Only works in this case b/c no test for length of run Gt 17
# if first 4 all gt 17 then return 1
# else return 1+ cumsum of lengths up to first true with length Gt or equal to 4

# The code to do that.

 if (rleg$values[1] && rleg$lengths[1] >=4 ) {1} else{
     1+ cumsum( rleg$lengths[1:which(rleg$lengths >=4 & 
                                     rleg$values)][1])}
#[1] 6
like image 24
IRTFM Avatar answered Nov 15 '22 11:11

IRTFM