Fast way to count events over a sliding window

Question

Suppose I have x = rnorm(100000) and instead of doing a 1000 length sliding window moving average, I wanted to do a 1000 length sliding window that counts all the times that x is above 0.2.

For example,

x <- rnorm(1004)
start <- 1:1000
record <- list()
while(start[length(start)] <= length(x)) {
    record[[length(record) + 1]] <- length(which(x[start] > 0.2))/length(start)
    start <- start + 1
    print(record[[length(record)]]);flush.console()
}

This becomes unmanegable for large length(x). What is a highly efficient method?

Martin Morgan · Accepted Answer

My contribution is to calculate the lagged difference between the cumulative sum of the condition

cumdiff = function(x) diff(c(0, cumsum( x > .2)), 20)

which along with

filt = function(x) filter(x > 0.2, rep(1, 20), sides=1)
library(TTR); ttr = function(x) runSum(x > .2, 20)
cumsub = function(x) { z <- cumsum(c(0, x>0.2)); tail(z,-20) - head(z,-20) }

performs ok

> library(microbenchmark)
> set.seed(123); xx = rnorm(100000)
> microbenchmark(cumdiff(xx), filt(xx), ttr(xx), cumsub(xx))
Unit: milliseconds
        expr       min        lq    median       uq      max neval
 cumdiff(xx) 11.192005 12.387862 12.469253 12.77588 13.72404   100
    filt(xx) 20.979503 22.058045 22.442765 23.02612 62.91730   100
     ttr(xx)  8.390923 10.023934 10.119772 10.46309 11.04173   100
  cumsub(xx)  7.015654  8.483432  8.538171  8.73596  9.65421   100

These differ in the specifics of how the result is represented (filt and ttr have leading NAs, for instance) and only filter deals with embedded NA's

> xx[22] = NA
> head(cumdiff(xx))  # NA's propagate, silently
[1]  9  9 NA NA NA NA
> ttr(xx)
Error in runSum(x > 0.2, 20) : Series contains non-leading NAs
> tail(filt(xx), -19)
 [1]  9  9 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  8  8  9
 ...

TheComeOnMan · Answer

For example's sake I'm using a vector of length 100 and counting over a window of 20 elements

x = rnorm(100)
y <- x > 0.2
z <- filter(y, rep(1,20), sides=1)

The filter command is the trick here. filter takes the vector y and walks through each element. The rep(1,20) tells it to take the 20 elements at a time, multiply them by 1, and sum them (you could do re(1/20,20) to get the average for instance). The sides = 1 tells filter to look to the right of the element, i.e. consider the 20 elements previous to the current element.

> x
  [1] -0.561107013 -0.687590079 -0.512261471  0.335243662 -2.483070553 -0.995055193  1.711199893  0.165077559 -0.130174687
 [10]  0.671520592 -0.034204031  0.793000050 -1.407499532  0.929966998  0.464658383  1.291478397  1.122711651 -0.732289539
 [19] -0.947808154  1.601688862  0.755147769  0.962415501 -0.254079608 -0.283894519  1.091660426  0.926601058  0.249405592
 [28] -1.195970776 -0.384030894 -0.826048400  1.065626378 -1.935038267  1.464437755  0.675437142  0.201552948  0.367500849
 [37] -0.232627068 -1.472755498 -1.894668101 -1.173382831 -0.339885920 -0.248978811 -0.378980900  1.545109833  0.280077032
 [46]  0.684595064  1.867086057  0.320364730 -1.653199399 -1.365201143 -0.012793364 -0.874006845 -0.271929369 -0.119850417
 [55] -0.361720731  0.361642939  0.676907910  0.816370133  0.685732967 -0.868583473 -1.830861530  1.423500571 -0.760462112
 [64]  0.517177871  0.229760540 -0.187106487 -0.324125717 -0.515600949 -0.772414150 -1.144808696  0.003509157  0.138944467
 [73] -0.386850873  0.268739173  0.348214296  0.431076350 -0.418408389 -1.398493830 -0.331882279 -0.203976286 -0.646023792
 [82] -0.729974273 -0.744186350  0.877314843 -0.994216252  0.022162018 -1.101533562 -1.328293297  0.974986850 -0.152985355
 [91] -0.714722527  0.428360138 -0.055607165 -1.015410904  0.320052726  1.951380333  2.288315801  1.295683645 -2.494534350
[100]  1.171319898
> y
  [1] FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE
 [21]  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE
 [41] FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE
 [61] FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE
 [81] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE
> z
Time Series:
Start = 1 
End = 100 
Frequency = 1 
  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  9 10 11 11 10 11 12 12 12 12 11 12 11 12 12 12 12 11 11 11 10
 [41]  9  8  8  9  9  9  9 10 10 10  9  9  8  7  6  6  7  8  9  9  9 10 10 10 10  9  8  7  7  7  7  7  7  8  9  9  8  7  6  6
 [81]  6  5  5  5  4  4  4  4  5  5  5  6  6  5  5  5  6  7  7  8

Fast way to count events over a sliding window

Tags:

performance

loops

r

count

vector

user2763361

2 Answers

Martin Morgan

TheComeOnMan

Recent Activity

Donate For Us

Fast way to count events over a sliding window

Tags:

performance

loops

r

count

vector

user2763361

2 Answers

Martin Morgan

TheComeOnMan

Related questions

Recent Activity

Donate For Us