How can I efficiently vectorize the integrate function in R?

Question

I am numerically integrating with integrate(). The function I am integrating depends on several parameters and I would like to calculate the integral for a large number of parameter values. I currently do this using the mapply function.

MWE:

fc <- function(x, a, b) a*x + b
fc.wrapper <- function(p.a, p.b) integrate(fc, 0, 1, a = p.a, b = p.b)$value
a.vec <- seq(0, 1, length.out = 100)
b.vec <- seq(0, 1, length.out = 100) 
system.time(mapply(fc.wrapper, a.vec, b.vec))

Output:

   user  system elapsed 
  0.003   0.000   0.004

Problem is, in my application, this takes too long to run. So my question is, are there faster alternatives to the use of the mapply()? I have tried using future_mapply(), but it doesn't seem to speed things up.

Tried `future.apply()`:

library(future.apply)
plan(multisession, workers = 6L)
system.time(future_mapply(fc.wrapper, a.vec, b.vec, future.seed = T))

Output:

   user  system elapsed 
  0.315   0.008   0.752

I am also open to suggestions for faster user-written alternatives to integrate().

jblood94 · Accepted Answer

For a 1-D smooth function, with enough memory you can use straight non-adaptive Gauss-Legendre quadrature with a relatively large number of nodes to vectorize the integration. I'll demonstrate with a function with moderate computational cost:

f1 <- function(x, A, B, C, D) {
  (A*lgamma(x) - B*log(x) + C*qgamma(pcauchy(x), 5))*exp(sin(D*x))
}

a <- 0; b <- 100 # integration extents
A <- B <- C <- D <- seq(0, 1, length.out = 20) # parameter values

I'll explore the full 20^4 parameter space by using expand.grid:

system.time({
  ba2 <- (b - a)/2 # to normalize the integration extents to [-1, 1]
  nodes <- statmod::gauss.quad(100) # calculate the nodes
  x <- nodes[[1]]*ba2 + (b + a)/2
  w <- nodes[[2]]*ba2
  df1 <- expand.grid(x = x, A = A, B = B, C = C, D = D)
  df1 <- cbind(
    expand.grid(A = A, B = B, C = C, D = D),
    v = c(crossprod(w, matrix(do.call(f1, df1), length(x))))
  )
})
#>    user  system elapsed 
#>   23.01    0.12   23.15

Spot check the integration:

sum(f1(x, 0.5, 0.5, 0.5, 0.5)*w)
#> [1] 9919.591
integrate(\(x) f1(x, 0.5, 0.5, 0.5, 0.5), a, b, rel.tol = 1e-6)
#> 9919.598 with absolute error < 0.00023

If the function is amenable to broadcasting, the integration can be sped up even more using outer:

f2 <- function(x, A, B, C, D) {
  (outer(lgamma(x), A) - outer(log(x), B) + outer(qgamma(pcauchy(x), 5), C))*
    exp(sin(outer(x, D)))
}

system.time({
  ba2 <- (b - a)/2
  nodes <- statmod::gauss.quad(100)
  x <- nodes[[1]]*ba2 + (b + a)/2
  w <- nodes[[2]]*ba2
  df2 <- expand.grid(A = A, B = B, C = C, D = D)
  df2 <- transform(df2, v = c(crossprod(w, f2(x, A, B, C, D))))
})
#>    user  system elapsed 
#>    1.45    0.05    1.52

all.equal(df1$v, df2$v)
#> [1] TRUE

This is all using a single thread. It would be straightforward to do the above in parallel.

Compare to the mapply approach in the question:

system.time({
  f3 <- function(A, B, C, D) {
    integrate(f1, a, b, A = A, B = B, C = C, D = D)$value
  }
  
  df3 <- expand.grid(A = A, B = B, C = C, D = D)
  df3 <- transform(df3, v = mapply(f3, A, B, C, D))
})
#>    user  system elapsed 
#>  150.28    0.04  150.58

all.equal(df1$v, df3$v)
#> [1] "Mean relative difference: 2.018989e-05"

How can I efficiently vectorize the integrate function in R?

Tags:

r

vectorization

numerical-integration

MWE:

Output:

Tried `future.apply()`:

Output:

s.willis

1 Answers

jblood94

Recent Activity

Donate For Us

How can I efficiently vectorize the integrate function in R?

Tags:

r

vectorization

numerical-integration

MWE:

Output:

Tried future.apply():

Output:

s.willis

1 Answers

jblood94

Related questions

Recent Activity

Donate For Us

Tried `future.apply()`: