I have an xts of 1033 daily returns points for 5 currency pairs on which I want to run a rolling window regression, but rollapply is not working for my defined function which uses lm(). Here is my data:
> head(fxr)
USDZAR USDEUR USDGBP USDCHF USDCAD
2007-10-18 -0.005028709 -0.0064079963 -0.003878743 -0.0099537170 -0.0006153215
2007-10-19 -0.001544470 0.0014275520 -0.001842564 0.0023058211 -0.0111410271
2007-10-22 0.010878027 0.0086642116 0.010599365 0.0051899551 0.0173792230
2007-10-23 -0.022783987 -0.0075236355 -0.010804304 -0.0041668499 -0.0144788687
2007-10-24 -0.006561223 0.0008545792 0.001024275 -0.0004261666 0.0049525483
2007-10-25 -0.014788901 -0.0048523001 -0.001434280 -0.0050425302 -0.0046422944
> tail(fxr)
USDZAR USDEUR USDGBP USDCHF USDCAD
2012-02-10 0.018619309 0.007548205 0.005526184 0.006348533 0.0067151342
2012-02-13 -0.006449463 -0.001055966 -0.002206810 -0.001638002 -0.0016995755
2012-02-14 0.006320364 0.006843933 0.006605875 0.005992935 0.0007001751
2012-02-15 -0.001666872 0.004319096 -0.001568874 0.003686840 -0.0015009759
2012-02-16 0.006419616 -0.003401364 -0.005194817 -0.002709588 -0.0019044761
2012-02-17 -0.004339687 -0.003675992 -0.003319899 -0.003043481 0.0000000000
I can easily run an lm on it for the whole data set to model USDZAR against the other pairs:
> lm(USDZAR ~ ., data = fxr)$coefficients
(Intercept) USDEUR USDGBP USDCHF USDCAD
-1.309268e-05 5.575627e-01 1.664283e-01 -1.657206e-01 6.350490e-01
However I want to run a rolling 62 day window to get the evolution of these coefficients over time, so I create a function dolm which does this:
> dolm
function(x) {
return(lm(USDZAR ~ ., data = x)$coefficients)
}
However when I run rollapply on this I get the following:
> rollapply(fxr, 62, FUN = dolm)
Error in terms.formula(formula, data = data) :
'.' in formula and no 'data' argument
that is even though dolm(fxr) on its own works fine:
> dolm(fxr)
(Intercept) USDEUR USDGBP USDCHF USDCAD
-1.309268e-05 5.575627e-01 1.664283e-01 -1.657206e-01 6.350490e-01
What's going on here? It seems to work fine if dolm is a simpler function for example mean:
> dolm <- edit(dolm)
> dolm
function(x) {
return(mean(x))
}
> rollapply(fxr, 62, FUN = dolm)
USDZAR USDEUR USDGBP USDCHF USDCAD
2007-11-29 -1.766901e-04 -6.899297e-04 6.252596e-04 -1.155952e-03 7.021468e-04
2007-11-30 -1.266130e-04 -6.512204e-04 7.067767e-04 -1.098413e-03 7.247315e-04
2007-12-03 8.949942e-05 -6.406932e-04 6.637066e-04 -1.154806e-03 8.727564e-04
2007-12-04 2.042046e-04 -5.758493e-04 5.497422e-04 -1.116308e-03 7.124593e-04
2007-12-05 7.343586e-04 -4.899982e-04 6.161819e-04 -1.057904e-03 9.915495e-04
Any help much appreciated. Essentially what I want is to get the weightings for the regression of USDZAR ~ USDEUR + USDGBP + USDCHF + USDCAD over a rolling 62-day window.
There are several problems here:
rollapply
passes a matrix but lm
requires a data.frame
.rollapply
applies the function to each column separately unless we
specify by.column=FALSE
. rollapplyr
:1) Incorporating the above we have:
dolm <- function(x) coef(lm(USDZAR ~ ., data = as.data.frame(x))))
rollapplyr(fxr, 62, dolm, by.column = FALSE)
2) An alternative to the lm
in the dolm
above is to use lm.fit
which directly works with matrices and is also faster:
dolm <- function(x) coef(lm.fit(cbind(Intercept = 1, x[,-1]), x[,1]))
G. Grothendieck's answer is correct but you can do it faster with the rollRegres
package as the following example shows (the roll_regres.fit
function is ~118 times faster)
# simulate data
set.seed(101)
n <- 1000
wdth = 100
X <- matrix(rnorm(10 * n), n, 10)
y <- drop(X %*% runif(10)) + rnorm(n)
Z <- cbind(y, X)
# assign other function
dolm <- function(x)
coef(lm.fit(x[, -1], x[, 1]))
# show that they yield the same
library(zoo)
library(rollRegres)
all.equal(
rollapply(Z, wdth, FUN = dolm,
by.column = FALSE, align = "right", fill = NA_real_),
roll_regres.fit(X, y, wdth)$coefs,
check.attributes = FALSE)
#R [1] TRUE
# benchmark
library(compiler)
dolm <- cmpfun(dolm)
microbenchmark::microbenchmark(
newnew = roll_regres.fit(X, y, wdth),
prev = rollapply(Z, wdth, FUN = dolm,
by.column = FALSE, align = "right", fill = NA_real_),
times = 10)
#R Unit: microseconds
#R expr min lq mean median uq max neval
#R newnew 884.938 950.914 1026.134 1025.581 1057.581 1242.075 10
#R prev 111057.822 111903.649 118867.761 116857.726 122087.160 141362.229 10
You can also use the roll_regres
function from the package if you want to use a R formula instead.
A third options would be to update the R matrix in a QR decomposition as done in the code below. You can speed this up by doing it in C++ but than you will need the dchud
and dchdd
subroutines from LINPACK (or another function to update R)
library(SamplerCompare) # for LINPACK `chdd` and `chud`
roll_coef <- function(X, y, width){
n <- nrow(X)
p <- ncol(X)
out <- matrix(NA_real_, n, p)
is_first <- TRUE
i <- width
while(i <= n){
if(is_first){
is_first <- FALSE
qr. <- qr(X[1:width, ])
R <- qr.R(qr.)
# Use X^T for the rest
X <- t(X)
XtY <- drop(tcrossprod(y[1:width], X[, 1:width]))
} else {
x_new <- X[, i]
x_old <- X[, i - width]
# update R
R <- .Fortran(
"dchud", R, p, p, x_new, 0., 0L, 0L,
0., 0., numeric(p), numeric(p),
PACKAGE = "SamplerCompare")[[1]]
# downdate R
R <- .Fortran(
"dchdd", R, p, p, x_old, 0., 0L, 0L,
0., 0., numeric(p), numeric(p), integer(1),
PACKAGE = "SamplerCompare")[[1]]
# update XtY
XtY <- XtY + y[i] * x_new - y[i - width] * x_old
}
coef. <- .Internal(backsolve(R, XtY, p, TRUE, TRUE))
out[i, ] <- .Internal(backsolve(R, coef., p, TRUE, FALSE))
i <- i + 1
}
out
}
# simulate data
set.seed(101)
n <- 1000
wdth = 100
X <- matrix(rnorm(10 * n), n, 10)
y <- drop(X %*% runif(10)) + rnorm(n)
Z <- cbind(y, X)
# assign other function
dolm <- function(x)
coef(lm.fit(x[, -1], x[, 1]))
# show that they yield the same
library(zoo)
all.equal(
rollapply(Z, wdth, FUN = dolm,
by.column = FALSE, align = "right", fill = NA_real_),
roll_coef(X, y, wdth),
check.attributes = FALSE)
#R> [1] TRUE
# benchmark
library(compiler)
roll_coef <- cmpfun(roll_coef)
dolm <- cmpfun(dolm)
microbenchmark::microbenchmark(
new = roll_coef(X, y, wdth),
prev = rollapply(Z, wdth, FUN = dolm,
by.column = FALSE, align = "right", fill = NA_real_),
times = 10)
#R> Unit: milliseconds
#R> expr min lq mean median uq max neval cld
#R> new 8.631319 9.010579 9.808525 9.659665 9.973741 11.87083 10 a
#R> prev 118.257128 121.734860 124.489826 122.882318 127.195410 135.21280 10 b
The solution above requires that you form the model.matrix
and model.response
first but this is just three calls (one extra to model.frame
) prior to the call to roll_coef
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With