I am trying to compute a rolling covariance between a set of data (each column of my x variable) and one other (y variable) in R. I thought I could use one of the apply functions, but can not find how to roll two set of inputs at the same time. Here is what I tried :
set.seed(1)
x<-matrix(rnorm(500),nrow=100,ncol=5)
y<-rnorm(100)
rollapply(x,width=5,FUN= function(x) {cov(x,y)})
z<-cbind(x,y)
rollapply(z,width=5, FUN=function(x){cov(z,z[,6])})
But none is doing what I would like. One solution I found is to use a for loop, but wondering if I can be more efficient in R than :
dResult<-matrix(nrow=96,ncol=5)
for(iLine in 1:96){
for(iCol in 1:5){
dResult[iLine,iCol]=cov(x[iLine:(iLine+4),iCol],y[iLine:(iLine+4)])
}
}
which gives me the expected result :
head(dResult)
[,1] [,2] [,3] [,4] [,5]
[1,] 0.32056460 0.05281386 -1.13283586 -0.01741274 -0.01464430
[2,] -0.03246014 0.78631603 -0.34309778 0.29919297 -0.22243572
[3,] -0.16239479 0.56372428 -0.27476604 0.39007645 0.05461355
[4,] -0.56764687 0.09847672 0.11204244 0.78044096 -0.01980684
[5,] -0.43081539 0.01904417 0.01282632 0.35550327 0.31062580
[6,] -0.28890607 0.03967327 0.58307743 0.15055881 0.60704533
If you need something faster, and you do not need any of the non-default arguments to cov
, you can use TTR::runCov
. Note that it pads with leading NA
by default.
The speed difference will matter more on larger data. Here's an example of how to use it:
cov_joshua <- function() {
apply(x, 2, function(x, y) TTR::runCov(x, y, 5), y = y)
}
And here's a comparison to the currently accepted answer using the small dataset the OP provided:
cov_osssan <- function() {
f <- function(b) cov(b[,1], b[,2])
apply(x, 2, function(a) {
rollapplyr(cbind(a,y), width=5, FUN = f, by.column=FALSE)
})
}
require(zoo) # for cov_osssan
require(microbenchmark)
set.seed(1)
nr <- 100
nc <- 5
x <- matrix(rnorm(nc*nr),nrow=nr,ncol=nc)
y <- rnorm(nr)
microbenchmark(cov_osssan(), cov_joshua())
# Unit: milliseconds
# expr min lq median uq max neval
# cov_osssan() 22.881253 24.569906 25.625623 27.44348 32.81344 100
# cov_joshua() 5.841422 6.170189 6.706466 7.47609 31.24717 100
all.equal(cov_osssan(), cov_joshua()[-(1:4),]) # rm leading NA
# [1] TRUE
Now, using a larger data set:
system.time(cov_joshua())
# user system elapsed
# 2.117 0.032 2.158
system.time(cov_osssan())
# ^C
# Timing stopped at: 144.957 0.36 145.491
I got tired of waiting (after ~2.5 minutes) for cov_osssan
to complete.
This is a solution with rollapply()
and sapply()
:
sapply(1:5, function(j) rollapply(1:100, 5, function(i) cov(x[i, j], y[i])))
I think that it is more readable and more R-ish than the solution with for-loops, but I checked with microbenchmark
and it seems to be slower.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With