I have a long vector x, and another v, which contains lengths. I would like to sum x so that the answer y
is a vector of length length(v)
, and y[1]
is sum(x[1:v[i]])
, y[2]
is sum(x[(1+v[1]):(v[1]+v[2])])
, and so on. Essentially this is performing sparse matrix multiplication from a space of dimension length(x)
to one of dimension length(v)
. However, I would prefer not to bring in "advanced machinery", although I might have to. It does need to be very, very fast. Can anyone think of anything simpler than using a sparse matrix package?
Example -
x <- c(1,1,3,4,5)
v <- c(2,3)
y <- myFunc(x,v)
y
should be c(2,12)
I am open to any pre-processing - e.g, storing in v the starting indexes of each stretch.
y <- cumsum(x)[cumsum(v)]
y <- c(y[1], diff(y))
This looks like it's doing extra work because it's computing the cumsum for the whole vector, but it's actually faster than the other solutions so far, for both small and large numbers of groups.
Here's how I simulated the data
set.seed(5)
N <- 1e6
n <- 10
x <- round(runif(N,0,100),1)
v <- as.vector(table(sample(n, N, replace=TRUE)))
On my machine the timings with n <- 10
are:
changing to n <- 1e5
the timings are:
I suspect this is faster than doing matrix multiplication, even with a sparse matrix package, because one doesn't have to form the matrix or do any multiplication. If more speed is needed, I suspect it could be sped up by writing it in C; not hard to do with the inline
and rcpp
packages, but I'll leave that to you.
You can do this using rowsum
. It should be reasonably fast as it uses C
code in the background.
y <- rowsum(x, rep(1:length(v), v))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With