Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Find the frequency and duration a wave is above a given value using conditional in data.table

An MRE is pasted below

MRE

date<-c('2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02')
time<-c('07:00:00 GMT','08:00:00 GMT','09:00:00 GMT','10:00:00 GMT','11:00:00 GMT','12:00:00 GMT','13:00:00 GMT','14:00:00 GMT','15:00:00 GMT','16:00:00 GMT','17:00:00 GMT', '18:00:00 GMT','19:00:00 GMT','20:00:00 GMT','21:00:00 GMT','22:00:00 GMT','23:00:00 GMT','00:00:00 GMT', '01:00:00 GMT','02:00:00 GMT','03:00:00 GMT','04:00:00 GMT','05:00:00 GMT','06:00:00 GMT','07:00:00 GMT','08:00:00 GMT','09:00:00 GMT','10:00:00 GMT','11:00:00 GMT','12:00:00 GMT','13:00:00 GMT','14:00:00 GMT','15:00:00 GMT','16:00:00 GMT','17:00:00 GMT','18:00:00 GMT','19:00:00 GMT','20:00:00 GMT','21:00:00 GMT')
el<-c(0.257,0.687,1.861,3.288, 4.821,6.172,7.048,7.258,6.799,5.654,4.463,3.443,2.704,2.708,3.328,4.23,5.244,5.985,6.317,6.074,5.234,3.981,2.662,1.615,0.88,0.746,1.405,2.527,3.928,5.283,6.517,7.179,7.252,6.625,5.454,4.214,3.144,2.491,2.357)
Time<-as.POSIXct(paste(date, time),tz="GMT")
wave<-data.table(Time, el)
ggplot(wave, aes(wave$Time, wave$el)) + geom_point() + labs(x="time", y="elevation") + geom_hline(aes(yintercept=4))

I have a wave time series and I want to be able to have a function that is able to tell me the frequency and mean/median duration the wave is above a given elevation. In my example I have chosen 4.

I want to interpolate the time when the wave reaches 4 on the rising and falling edges and find the time difference between the two points for each wave.

I can do this with a for loop, but I think that I should be able to do it in data.table much faster. I have 1mil+ points for several locations and do not think a for loop would be efficient.

For the rising wave I want to do something like:

wave[,timeIs4:=ifelse(elev<3 & elev[+1]>4,TRUE,FALSE )]

But instead of TRUE put in my interpolation calculation. I do not know how to access preceding and proceeding values within a data table such as in a for loop i+1 or i-1.

Desired Output

Rising leg I want to interpolate between points 4 and 5; 15 and 16; 29 and 30.

Falling leg I want to interpolate between points 11 and 12; 21 and 22; 36 and 37

Approximate Outcome

Rising      Falling
10:28:00    17:27:00
21:45:00    3:59:00
11:03:00    18:12:00

Then I will be able to subtract Rising from Falling using difftime() to determine the amount of time the water level was above the given elevation.

This will give me the frequency and duration the water is above the given elevation.

like image 310
Jeff Tilton Avatar asked Sep 12 '15 20:09

Jeff Tilton


2 Answers

Here's a possible solution using the devel version from GH. You will need it for both the shift function (as mentioned by @Jan) and fir the new dcast method which accepts expressions. Also, you don't have minutes in your MRE, so not sure where did you get those in your expected output.

Anyway, for starters, we will create an index (we will call it Wave so you will know from which wave # its coming from) that will tell us if the wave is rising or falling using shift. Then, we will dcast on matched values while removing the unmatched using na.omit (you can reorder the column names later if you like using the setcolorder function)

library(data.table) ## V 1.9.5+
dt[elev <= 4 & shift(elev, type = "lead") > 4, Wave := "Rising"]
dt[elev > 4 & shift(elev, type = "lead") <= 4, Wave := "Falling"]
dcast(na.omit(dt), cumsum(Wave == "Rising") ~ Wave, value.var = "time")
#    Wave             Falling              Rising
# 1:    1 2001-01-01 17:00:00 2001-01-01 10:00:00
# 2:    2 2001-01-02 03:00:00 2001-01-01 21:00:00
# 3:    3 2001-01-02 18:00:00 2001-01-02 11:00:00
like image 162
David Arenburg Avatar answered Oct 14 '22 19:10

David Arenburg


Here is another possible idea:

elev = 4    

#a helper function to calculate elapsed time
ff = function(el1, el2, el, time1, time2) 
            time1 + ((el - el1) / (el2 - el1)) * (time2 - time1)    

dif = diff(findInterval(wave$el, c(-Inf, elev, Inf)))
ris = which(dif == 1)  #risings
fal = which(dif == -1)  #fallings

ff(wave$el[ris], wave$el[ris + 1], elev, wave$Time[ris], wave$Time[ris + 1])
#[1] "2001-01-01 10:27:52 GMT" "2001-01-01 21:44:42 GMT" "2001-01-02 11:03:11 GMT"
ff(wave$el[fal], wave$el[fal + 1], elev, wave$Time[fal], wave$Time[fal + 1])
#[1] "2001-01-01 17:27:14 GMT" "2001-01-02 03:59:05 GMT" "2001-01-02 18:12:00 GMT"
like image 39
alexis_laz Avatar answered Oct 14 '22 19:10

alexis_laz