Minimalist example of what I'm trying to do:
dX_i <- rnorm(100, 0, 0.0002540362)
p_vec <- seq(0, 1, 0.25)
gamma_vec <- seq(1, 2, 0.25)
a_vec <- seq(2, 6, 1)
sigma_hat_vec <- c(0.03201636, 0.05771143, 0.07932116, 0.12262327, 0.15074560)
delta_j_vec <- c(0.0000005850109, 0.0000011700217, 0.0000017550326, 0.0000035100651, 0.0000052650977)
parameters <- expand.grid("p" = p_vec, "gamma" = gamma_vec, "a" = a_vec, "sigma_hat" = sigma_hat_vec, "delta_j" = delta_j_vec)
result <- sapply(1:nrow(parameters), function(x) {
tmp <- parameters[x,]
p <- tmp$p
a <- tmp$a
gamma <- tmp$gamma
sigma_hat <- tmp$sigma_hat
delta_j <- tmp$delta_j
B <- sum( (abs(dX_i)^p) * ( abs(dX_i) < gamma * a * sigma_hat * delta_j^(1/2) ))
return(B)
})
Goal: I need to calculate B
on vector dX given all combinations of p, a, gamma, sigma_hat, delta_j.
However, in reality the grid parameters
has ~600k rows, and dX_i
has length ~80k. Moreover, I have a list with ~1000 dX_i
. Therefore, I want to make this calculation as efficient as possible. Other approaches, e.g. converting parameters
to data.table and running sapply
within that data.table did not seem to give a speedup.
I tried parallelizing the function (I am limited to running the script on a virtual Windows machine):
cl <- makePSOCKcluster(numCores)
num.iter <- 1:nrow(parameters)
parSapply(cl, num.iter, function(x, parameters, dX_i) {
tmp <- parameters[x,]
p <- tmp$p
a <- tmp$a
gamma <- tmp$gamma
sigma_hat <- tmp$sigma_hat
delta_j <- tmp$delta_j
sum( (abs(dX_i)^p) * ( abs(dX_i) < gamma * a * sigma_hat * delta_j^(1/2) ))
}, parameters, dX_i)
stopCluster(cl)
While this gave me a speedup, I still feel like I'm not really solving this problem in the most efficient way and would appreciate any suggestions.
@josliber's answer is very good. Yet, it makes it look like R is bad... and you have to switch to C++ for performance.
There are three tricks that are implemented in their answer:
dX_i
The first two tricks are just an R trick called "vectorization" -> basically do your operations (e.g. gamma * a * sigma_hat * delta_j^(1/2)
or abs()
) on the whole vectors instead of on a single element within a loop.
This is exactly what you do when using sum( dX_i^p * vec_boolean )
; it is vectorized (the *
and sum
) so that it should be very fast.
If we implement just these two tricks (we can't really do the third one the same way because it breaks vectorization), it gives:
abs_dX_i <- abs(dX_i)
thresh <- with(parameters, gamma * a * sigma_hat * sqrt(delta_j))
p <- parameters$p
result3 <- sapply(1:nrow(parameters), function(i) {
in_sum <- (abs_dX_i < thresh[i])
sum(abs_dX_i[in_sum]^p[i])
})
all.equal(result, result3) # TRUE
If we benchmark all three solutions:
microbenchmark::microbenchmark(
OP = {
result <- sapply(1:nrow(parameters), function(x) {
tmp <- parameters[x,]
p <- tmp$p
a <- tmp$a
gamma <- tmp$gamma
sigma_hat <- tmp$sigma_hat
delta_j <- tmp$delta_j
B <- sum( (abs(dX_i)^p) * ( abs(dX_i) < gamma * a * sigma_hat * delta_j^(1/2) ))
return(B)
})
},
RCPP = {
result2 <- proc(sort(abs(dX_i)), parameters$gamma * parameters$a *
parameters$sigma_hat * parameters$delta_j^(1/2), parameters$p)
},
R_VEC = {
abs_dX_i <- abs(dX_i)
thresh <- with(parameters, gamma * a * sigma_hat * sqrt(delta_j))
p <- parameters$p
result3 <- sapply(1:nrow(parameters), function(i) {
in_sum <- (abs_dX_i < thresh[i])
sum(abs_dX_i[in_sum]^p[i])
})
},
times = 10
)
We get:
Unit: milliseconds
expr min lq mean median uq max neval
OP 224.8414 235.4075 289.90096 270.2767 347.1727 399.3262 10
RCPP 14.8172 15.4691 18.83703 16.3979 20.3829 29.6624 10
R_VEC 28.3136 29.5964 32.82456 31.4124 33.2542 45.8199 10
It gives a huge speedup by just slightly modifying your original code in R.
This is less than twice as slow as the Rcpp code and can be easily parallelized as you previously did with parSapply()
.
When I want to speed up hard-to-vectorize code, I often turn to Rcpp. At the end of the day you are trying to sum up abs(dX_i)^p
, limiting to values of abs(dX_i)
smaller than threshold gamma * a * sigma_hat * delta_j^(1/2)
. You want to do this for a bunch of pairs of p
and a threshold. You could accomplish this with:
library(Rcpp)
cppFunction(
"NumericVector proc(NumericVector dX_i, NumericVector thresh, NumericVector p) {
const int n = thresh.size();
const int m = dX_i.size();
NumericVector B(n);
for (int i=0; i < n; ++i) {
B[i] = 0;
for (int j=0; j < m; ++j) {
if (dX_i[j] < thresh[i]) {
B[i] += pow(dX_i[j], p[i]);
} else {
break;
}
}
}
return B;
}"
)
result2 <- proc(sort(abs(dX_i)), parameters$gamma * parameters$a * parameters$sigma_hat * parameters$delta_j^(1/2), parameters$p)
all.equal(result, result2)
# [1] TRUE
Note that my code sorts the absolute value of dX_i so it can stop the calculation once it encounters the first value exceeding the threshold.
On my machine, I see a 20x speedup from 0.158 seconds for your code to 0.007 seconds for the Rcpp code (measured using system.time
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With