I am trying to do the matrix multiplication S_g
for each i, and each g with i. This is what I have tried so far, but it takes a huge amount of time to complete. Is there a more computationally efficient method to do exactly the same thing?
The main thing to note from this formula is the S_g
uses X_gamma and Y[,i] in matrix multiplication set-up. X_gamma is dependent on value g
. Therefore, for each i, I have to perform g
matrix multiplications.
Here is the logic:
My main problem is that IN REALITY, g = 13,000
, and i = 700
.
library(foreach)
library(doParallel) ## parallel backend for the foreach function
registerDoParallel()
T = 3
c = 100
X <- zoo(data.frame(A = c(0.1, 0.2, 0.3), B = c(0.4, 0.5, 0.6), C = c(0.7,0.8,0.9)),
order.by = seq(from = as.Date("2013-01-01"), length.out = 3, by = "month"))
Y <- zoo(data.frame(Stock1 = rnorm(3,0,0.5), Stock2 = rnorm(3,0,0.5), Stock3 = rnorm(3,0,0.5)),
order.by = seq(from = as.Date("2013-01-01"), length.out = 3, by = "month"))
l <- rep(list(0:1),ncol(X))
set = do.call(expand.grid, l)
colnames(set) <- colnames(X)
I = diag(T)
denom <- foreach(i=1:ncol(Y)) %dopar% {
library(zoo)
library(stats)
library(Matrix)
library(base)
result = c()
for(g in 1:nrow(set)) {
X_gamma = X[,which(colnames(X) %in% colnames(set[which(set[g,] != 0)]))]
S_g = Y[,i] %*% (I - (c/(1+c))*(X_gamma %*% solve(crossprod(X_gamma)) %*% t(X_gamma))) %*% Y[,i]
result[g] = ((1+c)^(-sum(set[g,])/2)) * ((S_g)^(-T/2))
}
sum(result)
}
Thank you for your help!
In linear algebra, the Strassen algorithm, named after Volker Strassen, is an algorithm for matrix multiplication. It is faster than the standard matrix multiplication algorithm for large matrices, with a better asymptotic complexity, although the naive algorithm is often better for smaller matrices.
As matrices grow larger, the number of multiplications needed to find their product increases much faster than the number of additions.
Because MATLAB is a programming language at first developed for numerical linear algebra (matrix manipulations), which has libraries especially developed for matrix multiplications.
The most obvious problem is that you fell victim to one of the classic blunders: not preallocating the output vector result
. Appending one value at a time can be very inefficient for large vectors.
In your case, result
doesn't need to be a vector: you can accumulate the results in a single value:
result = 0
for(g in 1:nrow(set)) {
# snip
result = result + ((1+c)^(-sum(set[g,])/2)) * ((S_g)^(-T/2))
}
result
But I think the most important performance improvement that you could make is to precompute expressions that are currently being computed repeatedly in the foreach
loop. You can do that with a separate foreach
loop. I also suggest using solve
differently to avoid the second matrix multiplication:
X_gamma_list <- foreach(g=1:nrow(set)) %dopar% {
X_gamma <- X[, which(set[g,] != 0)]
I - (c/(1+c)) * (X_gamma %*% solve(crossprod(X_gamma), t(X_gamma)))
}
These computations are now performed only once, rather than once for each column of Y
, which is 700 times less work in your case.
In a similar vein, it makes sense to factor out the expression ((1+c)^(-sum(set[g,])/2))
, as suggested by tim riffe, as well as -T / 2
while we're at it:
a <- (1+c) ^ (-rowSums(set) / 2)
nT2 <- -T / 2
To iterate over the columns of the zoo
object Y
, I suggest using the isplitCols
function from the itertools
package. Make sure you load itertools
at the top of your script:
library(itertools)
isplitCols
let's you send only the columns that are needed for each task, rather than sending the entire object to all workers. The only trick is that you need to remove the dim
attribute from the resulting zoo
objects for your code to work, since isplitCols
uses drop=TRUE
.
Finally, here's the main foreach
loop:
denom <- foreach(Yi=isplitCols(Y, chunkSize=1), .packages='zoo') %dopar% {
dim(Yi) <- NULL # isplitCols uses drop=FALSE
result <- 0
for(g in seq_along(X_gamma_list)) {
S_g <- Yi %*% X_gamma_list[[g]] %*% Yi
result <- result + a[g] * S_g ^ nT2
}
result
}
Note that I would not perform the inner loop in parallel. That would only make sense if there weren't enough columns in Y
to keep all of your processors busy. Parallelizing the inner loop could result in tasks that are too short, effectively unchunking the computation and making the code run much slower. It's much more important to perform the inner loop efficiently since g
is large.
I second @eddi that you should give some objects so we can run code. The following remarks are based on staring:
1) you could save S_g
in a preallocated vector and do that last line ( ((1+c)^(-sum(set[g,])/2)) * ((S_g)^(-T/2))
) out of the loop, since rowSums(set)
will give you what you need. That removes one instance of indexing with g
2) indexing is slowing you down. Don't use which()
. Logical vectors work just fine.
3) -T/2
is dangerous. It can mean -0.5
. If that's what you want, then just do 1/sqrt(S_g_vec)
for speed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With