Most efficient way to calculate function with large number of parameter combinations

Question

Minimalist example of what I'm trying to do:

dX_i <- rnorm(100, 0, 0.0002540362)

p_vec <- seq(0, 1, 0.25)  
gamma_vec <- seq(1, 2, 0.25)     
a_vec <- seq(2, 6, 1)
sigma_hat_vec <- c(0.03201636, 0.05771143, 0.07932116, 0.12262327, 0.15074560)
delta_j_vec <- c(0.0000005850109, 0.0000011700217, 0.0000017550326, 0.0000035100651, 0.0000052650977)

parameters <- expand.grid("p" = p_vec, "gamma" = gamma_vec, "a" = a_vec, "sigma_hat" = sigma_hat_vec, "delta_j" = delta_j_vec)


result <- sapply(1:nrow(parameters), function(x) {
  tmp <- parameters[x,]
  p <- tmp$p
  a <- tmp$a
  gamma <- tmp$gamma
  sigma_hat <- tmp$sigma_hat
  delta_j <- tmp$delta_j

  B <- sum( (abs(dX_i)^p) * ( abs(dX_i) < gamma * a * sigma_hat * delta_j^(1/2) ))

  return(B)
})

Goal: I need to calculate B on vector dX given all combinations of p, a, gamma, sigma_hat, delta_j.

However, in reality the grid parameters has ~600k rows, and dX_i has length ~80k. Moreover, I have a list with ~1000 dX_i. Therefore, I want to make this calculation as efficient as possible. Other approaches, e.g. converting parameters to data.table and running sapply within that data.table did not seem to give a speedup.

I tried parallelizing the function (I am limited to running the script on a virtual Windows machine):

cl <- makePSOCKcluster(numCores)
num.iter <- 1:nrow(parameters)
parSapply(cl, num.iter, function(x, parameters, dX_i) {
  tmp <- parameters[x,]
  p <- tmp$p
  a <- tmp$a
  gamma <- tmp$gamma
  sigma_hat <- tmp$sigma_hat
  delta_j <- tmp$delta_j
  sum( (abs(dX_i)^p) * ( abs(dX_i) < gamma * a * sigma_hat * delta_j^(1/2) ))
}, parameters, dX_i)
stopCluster(cl)

While this gave me a speedup, I still feel like I'm not really solving this problem in the most efficient way and would appreciate any suggestions.

F. Privé · Accepted Answer

@josliber's answer is very good. Yet, it makes it look like R is bad... and you have to switch to C++ for performance.

There are three tricks that are implemented in their answer:

precompute the vector of threshold
precompute the absolute value of dX_i
sort these values to stop the sum early

The first two tricks are just an R trick called "vectorization" -> basically do your operations (e.g. gamma * a * sigma_hat * delta_j^(1/2) or abs()) on the whole vectors instead of on a single element within a loop.

This is exactly what you do when using sum( dX_i^p * vec_boolean ); it is vectorized (the * and sum) so that it should be very fast.

If we implement just these two tricks (we can't really do the third one the same way because it breaks vectorization), it gives:

abs_dX_i <- abs(dX_i)
thresh <- with(parameters, gamma * a * sigma_hat * sqrt(delta_j))
p <- parameters$p
result3 <- sapply(1:nrow(parameters), function(i) {
  in_sum <- (abs_dX_i < thresh[i])
  sum(abs_dX_i[in_sum]^p[i])
})
all.equal(result, result3) # TRUE

If we benchmark all three solutions:

microbenchmark::microbenchmark(
  OP = {
    result <- sapply(1:nrow(parameters), function(x) {
      tmp <- parameters[x,]
      p <- tmp$p
      a <- tmp$a
      gamma <- tmp$gamma
      sigma_hat <- tmp$sigma_hat
      delta_j <- tmp$delta_j

      B <- sum( (abs(dX_i)^p) * ( abs(dX_i) < gamma * a * sigma_hat * delta_j^(1/2) ))

      return(B)
    })
  },
  RCPP = {
    result2 <- proc(sort(abs(dX_i)), parameters$gamma * parameters$a *
                      parameters$sigma_hat * parameters$delta_j^(1/2), parameters$p)
  },
  R_VEC = {
    abs_dX_i <- abs(dX_i)
    thresh <- with(parameters, gamma * a * sigma_hat * sqrt(delta_j))
    p <- parameters$p
    result3 <- sapply(1:nrow(parameters), function(i) {
      in_sum <- (abs_dX_i < thresh[i])
      sum(abs_dX_i[in_sum]^p[i])
    })
  },
  times = 10
)

We get:

Unit: milliseconds
  expr      min       lq      mean   median       uq      max neval
    OP 224.8414 235.4075 289.90096 270.2767 347.1727 399.3262    10
  RCPP  14.8172  15.4691  18.83703  16.3979  20.3829  29.6624    10
 R_VEC  28.3136  29.5964  32.82456  31.4124  33.2542  45.8199    10

It gives a huge speedup by just slightly modifying your original code in R. This is less than twice as slow as the Rcpp code and can be easily parallelized as you previously did with parSapply().

josliber · Answer

When I want to speed up hard-to-vectorize code, I often turn to Rcpp. At the end of the day you are trying to sum up abs(dX_i)^p, limiting to values of abs(dX_i) smaller than threshold gamma * a * sigma_hat * delta_j^(1/2). You want to do this for a bunch of pairs of p and a threshold. You could accomplish this with:

library(Rcpp)
cppFunction(
"NumericVector proc(NumericVector dX_i, NumericVector thresh, NumericVector p) {
  const int n = thresh.size();
  const int m = dX_i.size();
  NumericVector B(n);
  for (int i=0; i < n; ++i) {
    B[i] = 0;
    for (int j=0; j < m; ++j) {
      if (dX_i[j] < thresh[i]) {
        B[i] += pow(dX_i[j], p[i]);
      } else {
        break;
      }
    }
  }
  return B;
}"
)
result2 <- proc(sort(abs(dX_i)), parameters$gamma * parameters$a * parameters$sigma_hat * parameters$delta_j^(1/2), parameters$p)
all.equal(result, result2)
# [1] TRUE

Note that my code sorts the absolute value of dX_i so it can stop the calculation once it encounters the first value exceeding the threshold.

On my machine, I see a 20x speedup from 0.158 seconds for your code to 0.007 seconds for the Rcpp code (measured using system.time).

Most efficient way to calculate function with large number of parameter combinations

Tags:

performance

loops

optimization

r

YalDan

2 Answers

F. Privé

josliber

Recent Activity

Donate For Us

Most efficient way to calculate function with large number of parameter combinations

Tags:

performance

loops

optimization

r

YalDan

2 Answers

F. Privé

josliber

Related questions

Recent Activity

Donate For Us