I think there will be much more people interested into this subject. I have some specific task to do in the most efficient way. My base data are: - time indices of buy and sell signals - on the diag of time indicies I have ROC (rate of change) between closest buy-sell pairs:
r <- array(data = NA,
dim = c(5, 5),
dimnames = list(buy_idx = c(1,5,9,12,16),
sell_idx = c(3,7,10,14,19)))
diag(r) <- c(1.04,0.97,1.07,1.21,1.1)
The task is to generate moving compound ROC on every possible window (buy-sell pairs), and the way I'm solving my task currently:
for(i in 2:5){
r[1:(i-1),i] <- r[1:(i-1),i-1] * r[i,i]
}
Until I'm not looping it somewhere upper, the time of my solution is very acceptable. Is there a way to change this loop to vectorized solution? Are there any good well documented tutorials to learn vectorized type of thinking in R? - it would be much more valuable than one time solution!
edit 20130709:
Next task highly related to previous task/example. Apply tax value on each transaction (tax in % values). Current solution:
diag(r[,]) <- diag(r[,]) * ((1-(tax/100))^2)
for(i in 2:dim(r)[2]){
r[1:(i-1),i] <- r[1:(i-1),i] * ((1-(tax/100))^(2*(i:2)))
}
Do you know any more efficient way? or more correct if this doesn't handle everything.
If d
are your diagonal elements, then everywhere j >= i
, r[i,j]
is prod(d[i:j])
, which can also be written prod(d[1:j]) / prod(d[1:(i-1)])
. Hence this trick using the outer
ratio of the cumulative product:
d <- c(1.04,0.97,1.07,1.21,1.1)
n <- length(d)
p <- cumprod(c(1, d))
r <- t(outer(p, 1/p, "*"))[-n-1, -1]
r[lower.tri(r)] <- NA
Some benchmarks showing that it does better than OP for some (not all) input sizes:
OP <- function(d) {
r <- diag(d)
for(i in 2:length(d)){
r[1:(i-1),i] <- r[1:(i-1),i-1] * r[i,i]
}
r
}
flodel <- function(d) {
n <- length(d)
p <- cumprod(c(1, d))
r <- t(outer(p, 1/p, "*"))[-n-1, -1]
r[lower.tri(r)] <- NA
r
}
d <- runif(10)
microbenchmark(OP(d), flodel(d))
# Unit: microseconds
# expr min lq median uq max
# 1 flodel(d) 83.028 85.6135 88.4575 90.153 144.111
# 2 OP(d) 115.993 122.0075 123.4730 126.826 206.892
d <- runif(100)
microbenchmark(OP(d), flodel(d))
# Unit: microseconds
# expr min lq median uq max
# 1 flodel(d) 490.819 545.528 549.6095 566.108 684.043
# 2 OP(d) 1227.235 1260.823 1282.9880 1313.264 3913.322
d <- runif(1000)
microbenchmark(OP(d), flodel(d))
# Unit: milliseconds
# expr min lq median uq max
# 1 flodel(d) 97.78687 106.39425 121.13807 133.99502 154.67168
# 2 OP(d) 53.49014 60.10124 72.56427 85.17864 91.89011
edit to answer 20130709 addition:
I'll assume tax
is a scalar and let z <- (1- tax/100)^2
. Your final result is r
multiplied by a matrix of z
raised at different powers. What you want to avoid is compute these powers over and over. Here is what I would do:
pow <- 1L + col(r) - row(r)
pow[lower.tri(pow)] <- NA
tax.mult <- (z^(1:n))[pow]
r <- r * tax.mult
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With