Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compute running mean with tapered windows

Given a (dummy) vector

index=log(seq(10,20,by=0.5))

I want to compute the running mean with centered window and with tapered windows at each end, i.e. that the first entry is left untouched, the second is the average of a window size of 3, and so on until the specified window size is reached.

The answers given here: Calculating moving average, seem to all produce a shorter vector cutting off the start and end where the window is too large, for example:

ma <- function(x,n=5){filter(x,rep(1/n,n), sides=2)}

ma(index)

Time Series:
Start = 1 
End = 21 
Frequency = 1 
[1]       NA       NA 2.395822 2.440451 2.483165 2.524124 2.563466 2.601315
[9] 2.637779 2.672957 2.706937 2.739798 2.771611 2.802441 2.832347 2.861383
[17] 2.889599 2.917039 2.943746       NA       NA

same goes for

rollmean(index,5)

from the zoo package

Is there a quick way of implementing tapered windows without resorting to coding up loops?

like image 333
Adrian Tompkins Avatar asked Mar 14 '18 09:03

Adrian Tompkins


2 Answers

As rollapply can be quite slow, it is often worth writing a simple bespoke function...

tapermean <- function(x, width=5){
                 taper <- pmin(width,
                               2*(seq_along(x))-1,
                               2*rev(seq_along(x))-1) #set taper pattern
                 lower <- seq_along(x)-(taper-1)/2    #lower index for each mean
                 upper <- lower+taper                 #upper index for each mean
                 x <- c(0, cumsum(x))                 #sum x once
                 return((x[upper]-x[lower])/taper)}   #return means

This is over 200x faster than the rollapply solution...

library(microbenchmark)
index <- log(seq(10,200,by=0.5)) #longer version for testing
w <- c(seq(1,5,2),rep(5,length(index)-5-1),seq(5,1,-2)) #as in Scarabees answer

microbenchmark(tapermean(index),
               rollapply(index,w,mean))

Unit: microseconds
                   expr       min         lq       mean     median        uq       max neval
       tapermean(index)   185.562   193.9405   246.4123   210.6965   284.548   590.197   100
rollapply(index,w,mean) 48213.027 49681.0715 52053.7352 50583.4320 51756.378 97187.538   100

I rest my case!

like image 51
Andrew Gustar Avatar answered Oct 13 '22 23:10

Andrew Gustar


The width argument of zoo::rollapply can be a numeric vector.

Hence, in your example, you can use:

rollapply(index, c(1, 3, 5, rep(5, 15), 5, 3, 1), mean)
#  [1] 2.302585 2.350619 2.395822 2.440451 2.483165 2.524124 2.563466 2.601315 2.637779 2.672957 2.706937 2.739798 2.771611 2.802441 2.832347 2.861383
# [17] 2.889599 2.917039 2.943746 2.970195 2.995732

And if n is an odd integer, a general solution is:

w <- c(seq(1, n, 2), rep(n, length(index) - n - 1), seq(n, 1, -2))
rollapply(index, w, mean)

Edit: If you care about performance, you can use a custom Rcpp function:

library(Rcpp)

cppFunction("NumericVector fasttapermean(NumericVector x, const int window = 5) {
  const int n = x.size();
  NumericVector y(n);

  double s = x[0];
  int w = 1;

  for (int i = 0; i < n; i++) {
    y[i] = s/w;
    if (i < window/2) {
      s += x[i + (w+1)/2] + x[i + (w+3)/2];
      w += 2;
    } else if (i > n - window/2 - 2) {
      s -= x[i - (w-1)/2] + x[i - (w-3)/2];
      w -= 2;
    } else {
      s += x[i + (w+1)/2] - x[i - (w-1)/2];
    }
  }

  return y;
}")

New benchmark:

n <- 5
index <- log(seq(10, 200, by = .5))
w <- c(seq(1, n, 2), rep(n, length(index) - n - 1), seq(n, 1, -2))

bench::mark(
  fasttapermean(index),
  tapermean(index),
  zoo::rollapply(index, w, mean)
)
# # A tibble: 3 x 14
#   expression                          min     mean   median      max `itr/sec` mem_alloc  n_gc n_itr total_time result      memory              time     gc
#   <chr>                          <bch:tm> <bch:tm> <bch:tm> <bch:tm>     <dbl> <bch:byt> <dbl> <int>   <bch:tm> <list>      <list>              <list>   <list>
# 1 fasttapermean(index)              4.7us   5.94us   5.56us   67.6us  168264.     5.52KB     0 10000     59.4ms <dbl [381]> <Rprofmem [2 x 3]>  <bch:tm> <tibble [10,000 x 3]>
# 2 tapermean(index)                 53.9us  79.68us  91.08us  405.8us   12550.    37.99KB     3  5951    474.2ms <dbl [381]> <Rprofmem [16 x 3]> <bch:tm> <tibble [5,954 x 3]>
# 3 zoo::rollapply(index, w, mean)   12.8ms  15.42ms  14.31ms   29.2ms      64.9  100.58KB     8    23    354.7ms <dbl [381]> <Rprofmem [44 x 3]> <bch:tm> <tibble [31 x 3]>

However if you care about (extreme) precision you should use the rollapply method because the built-in mean algorithm of R is more accurate than the naive sum-and-divide approach.

Also note that the rollapply method is the only one that allows you to use na.rm = TRUE if needed.

like image 6
Scarabee Avatar answered Oct 14 '22 01:10

Scarabee