Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compute rolling covariance more efficiently

I am trying to compute a rolling covariance between a set of data (each column of my x variable) and one other (y variable) in R. I thought I could use one of the apply functions, but can not find how to roll two set of inputs at the same time. Here is what I tried :

 set.seed(1)
 x<-matrix(rnorm(500),nrow=100,ncol=5)
 y<-rnorm(100)
 rollapply(x,width=5,FUN= function(x) {cov(x,y)})
 z<-cbind(x,y)
 rollapply(z,width=5, FUN=function(x){cov(z,z[,6])})

But none is doing what I would like. One solution I found is to use a for loop, but wondering if I can be more efficient in R than :

dResult<-matrix(nrow=96,ncol=5)
for(iLine in 1:96){
    for(iCol in 1:5){
        dResult[iLine,iCol]=cov(x[iLine:(iLine+4),iCol],y[iLine:(iLine+4)])
    }
}

which gives me the expected result :

head(dResult)


           [,1]       [,2]        [,3]        [,4]        [,5]
[1,]  0.32056460 0.05281386 -1.13283586 -0.01741274 -0.01464430
[2,] -0.03246014 0.78631603 -0.34309778  0.29919297 -0.22243572
[3,] -0.16239479 0.56372428 -0.27476604  0.39007645  0.05461355
[4,] -0.56764687 0.09847672  0.11204244  0.78044096 -0.01980684
[5,] -0.43081539 0.01904417  0.01282632  0.35550327  0.31062580
[6,] -0.28890607 0.03967327  0.58307743  0.15055881  0.60704533
like image 463
Djiggy Avatar asked Jul 28 '16 19:07

Djiggy


2 Answers

If you need something faster, and you do not need any of the non-default arguments to cov, you can use TTR::runCov. Note that it pads with leading NA by default.

The speed difference will matter more on larger data. Here's an example of how to use it:

cov_joshua <- function() {
  apply(x, 2, function(x, y) TTR::runCov(x, y, 5), y = y)
}

And here's a comparison to the currently accepted answer using the small dataset the OP provided:

cov_osssan <- function() {
  f <- function(b) cov(b[,1], b[,2])
  apply(x, 2, function(a) {
    rollapplyr(cbind(a,y), width=5, FUN = f, by.column=FALSE)
  })
}
require(zoo)  # for cov_osssan
require(microbenchmark)
set.seed(1)
nr <- 100
nc <- 5
x <- matrix(rnorm(nc*nr),nrow=nr,ncol=nc)
y <- rnorm(nr)
microbenchmark(cov_osssan(), cov_joshua())
# Unit: milliseconds
#          expr       min        lq    median       uq      max neval
#  cov_osssan() 22.881253 24.569906 25.625623 27.44348 32.81344   100
#  cov_joshua()  5.841422  6.170189  6.706466  7.47609 31.24717   100
all.equal(cov_osssan(), cov_joshua()[-(1:4),])  # rm leading NA
# [1] TRUE

Now, using a larger data set:

system.time(cov_joshua())
#    user  system elapsed 
#   2.117   0.032   2.158 
system.time(cov_osssan())
# ^C
# Timing stopped at: 144.957 0.36 145.491 

I got tired of waiting (after ~2.5 minutes) for cov_osssan to complete.

like image 196
Joshua Ulrich Avatar answered Oct 25 '22 03:10

Joshua Ulrich


This is a solution with rollapply() and sapply():

sapply(1:5, function(j) rollapply(1:100, 5, function(i) cov(x[i, j], y[i])))

I think that it is more readable and more R-ish than the solution with for-loops, but I checked with microbenchmark and it seems to be slower.

like image 33
Stibu Avatar answered Oct 25 '22 04:10

Stibu