I am calculating standard deviations on an expanding window where at each point I recalculate the standard deviation. This seems like a fairly straightforward thing to do that should be relatively fast. However, it takes a lot longer than you might think (~45 seconds). Am I missing something here? In Matlab this is quite fast.
t0 <- proc.time()[[3]]
z <- rep(0, 7000)
x <- rnorm(8000)
for(i in 1000:8000){
## print(i)
z[i] <- sd(x[1:i])
}
print(proc.time()[[3]]- t0)
You might also try an algorithm that updates the standard deviation (well, actually, the sum of squares of differences from the mean) as you go. On my system this reduces the time from ~0.8 seconds to ~0.002 seconds.
n <- length(x)
m <- cumsum(x)/(1:n)
m1 <- c(NA,m[1:(n-1)])
ssd <- (x-m)*(x-m1)
v <- c(0,cumsum(ssd[-1])/(1:(n-1)))
z <- sqrt(v)
See http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance for details.
Also see the answers to this question: Efficient calculation of matrix cumulative standard deviation in r
Edited to fix some typos, sorry.
This takes ~1.3 seconds on my machine:
t0 <- proc.time()[[3]]
x <- rnorm(8000)
z <- sapply(1000:8000,function(y){sd(x[seq_len(y)])})
print(proc.time()[[3]]- t0)
and I'd be willing to bet there are even faster ways of doing this. Avoid explicit for
loops!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With