Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Computing rolling sums of stretches a vector with R

I have a long vector x, and another v, which contains lengths. I would like to sum x so that the answer y is a vector of length length(v), and y[1] is sum(x[1:v[i]]), y[2] is sum(x[(1+v[1]):(v[1]+v[2])]), and so on. Essentially this is performing sparse matrix multiplication from a space of dimension length(x) to one of dimension length(v). However, I would prefer not to bring in "advanced machinery", although I might have to. It does need to be very, very fast. Can anyone think of anything simpler than using a sparse matrix package?

Example -

x <- c(1,1,3,4,5)
v <- c(2,3)
y <- myFunc(x,v)

y should be c(2,12)

I am open to any pre-processing - e.g, storing in v the starting indexes of each stretch.

like image 750
ryan Avatar asked Dec 28 '22 11:12

ryan


2 Answers

  y <- cumsum(x)[cumsum(v)]
  y <- c(y[1], diff(y))

This looks like it's doing extra work because it's computing the cumsum for the whole vector, but it's actually faster than the other solutions so far, for both small and large numbers of groups.

Here's how I simulated the data

set.seed(5)
N <- 1e6
n <- 10
x <- round(runif(N,0,100),1)
v <- as.vector(table(sample(n, N, replace=TRUE)))

On my machine the timings with n <- 10 are:

  • Brandon Bertelsen (for loop): 0.017
  • Ramnath (rowsum): 0.057
  • John (split/apply): 0.280
  • Aaron (cumsum): 0.008

changing to n <- 1e5 the timings are:

  • Brandon Bertelsen (for loop): 2.181
  • Ramnath (rowsum): 0.226
  • John (split/apply): 0.852
  • Aaron (cumsum): 0.015

I suspect this is faster than doing matrix multiplication, even with a sparse matrix package, because one doesn't have to form the matrix or do any multiplication. If more speed is needed, I suspect it could be sped up by writing it in C; not hard to do with the inline and rcpp packages, but I'll leave that to you.

like image 77
Aaron left Stack Overflow Avatar answered Jan 11 '23 10:01

Aaron left Stack Overflow


You can do this using rowsum. It should be reasonably fast as it uses C code in the background.

y <- rowsum(x, rep(1:length(v), v))
like image 41
Ramnath Avatar answered Jan 11 '23 10:01

Ramnath