Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Surprisingly Slow Standard Deviation in R

Tags:

r

I am calculating standard deviations on an expanding window where at each point I recalculate the standard deviation. This seems like a fairly straightforward thing to do that should be relatively fast. However, it takes a lot longer than you might think (~45 seconds). Am I missing something here? In Matlab this is quite fast.

t0 <- proc.time()[[3]]
z <- rep(0, 7000)
x <- rnorm(8000)
for(i in 1000:8000){
##    print(i)
    z[i] <- sd(x[1:i])
}
print(proc.time()[[3]]- t0)
like image 664
rlh2 Avatar asked Dec 09 '22 05:12

rlh2


2 Answers

You might also try an algorithm that updates the standard deviation (well, actually, the sum of squares of differences from the mean) as you go. On my system this reduces the time from ~0.8 seconds to ~0.002 seconds.

n <- length(x)
m <- cumsum(x)/(1:n)
m1 <- c(NA,m[1:(n-1)])
ssd <- (x-m)*(x-m1)
v <- c(0,cumsum(ssd[-1])/(1:(n-1)))
z <- sqrt(v)

See http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance for details.

Also see the answers to this question: Efficient calculation of matrix cumulative standard deviation in r

like image 166
Aaron left Stack Overflow Avatar answered Dec 21 '22 23:12

Aaron left Stack Overflow


Edited to fix some typos, sorry.

This takes ~1.3 seconds on my machine:

t0 <- proc.time()[[3]]
x <- rnorm(8000)
z <- sapply(1000:8000,function(y){sd(x[seq_len(y)])})
print(proc.time()[[3]]- t0)

and I'd be willing to bet there are even faster ways of doing this. Avoid explicit for loops!

like image 32
joran Avatar answered Dec 21 '22 23:12

joran