Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I efficiently vectorize the integrate function in R?

I am numerically integrating with integrate(). The function I am integrating depends on several parameters and I would like to calculate the integral for a large number of parameter values. I currently do this using the mapply function.

MWE:

fc <- function(x, a, b) a*x + b
fc.wrapper <- function(p.a, p.b) integrate(fc, 0, 1, a = p.a, b = p.b)$value
a.vec <- seq(0, 1, length.out = 100)
b.vec <- seq(0, 1, length.out = 100) 
system.time(mapply(fc.wrapper, a.vec, b.vec))

Output:

   user  system elapsed 
  0.003   0.000   0.004 

Problem is, in my application, this takes too long to run. So my question is, are there faster alternatives to the use of the mapply()? I have tried using future_mapply(), but it doesn't seem to speed things up.

Tried future.apply():

library(future.apply)
plan(multisession, workers = 6L)
system.time(future_mapply(fc.wrapper, a.vec, b.vec, future.seed = T))

Output:

   user  system elapsed 
  0.315   0.008   0.752

I am also open to suggestions for faster user-written alternatives to integrate().

like image 459
s.willis Avatar asked Nov 15 '25 11:11

s.willis


1 Answers

For a 1-D smooth function, with enough memory you can use straight non-adaptive Gauss-Legendre quadrature with a relatively large number of nodes to vectorize the integration. I'll demonstrate with a function with moderate computational cost:

f1 <- function(x, A, B, C, D) {
  (A*lgamma(x) - B*log(x) + C*qgamma(pcauchy(x), 5))*exp(sin(D*x))
}

a <- 0; b <- 100 # integration extents
A <- B <- C <- D <- seq(0, 1, length.out = 20) # parameter values

I'll explore the full 20^4 parameter space by using expand.grid:

system.time({
  ba2 <- (b - a)/2 # to normalize the integration extents to [-1, 1]
  nodes <- statmod::gauss.quad(100) # calculate the nodes
  x <- nodes[[1]]*ba2 + (b + a)/2
  w <- nodes[[2]]*ba2
  df1 <- expand.grid(x = x, A = A, B = B, C = C, D = D)
  df1 <- cbind(
    expand.grid(A = A, B = B, C = C, D = D),
    v = c(crossprod(w, matrix(do.call(f1, df1), length(x))))
  )
})
#>    user  system elapsed 
#>   23.01    0.12   23.15

Spot check the integration:

sum(f1(x, 0.5, 0.5, 0.5, 0.5)*w)
#> [1] 9919.591
integrate(\(x) f1(x, 0.5, 0.5, 0.5, 0.5), a, b, rel.tol = 1e-6)
#> 9919.598 with absolute error < 0.00023

If the function is amenable to broadcasting, the integration can be sped up even more using outer:

f2 <- function(x, A, B, C, D) {
  (outer(lgamma(x), A) - outer(log(x), B) + outer(qgamma(pcauchy(x), 5), C))*
    exp(sin(outer(x, D)))
}

system.time({
  ba2 <- (b - a)/2
  nodes <- statmod::gauss.quad(100)
  x <- nodes[[1]]*ba2 + (b + a)/2
  w <- nodes[[2]]*ba2
  df2 <- expand.grid(A = A, B = B, C = C, D = D)
  df2 <- transform(df2, v = c(crossprod(w, f2(x, A, B, C, D))))
})
#>    user  system elapsed 
#>    1.45    0.05    1.52

all.equal(df1$v, df2$v)
#> [1] TRUE

This is all using a single thread. It would be straightforward to do the above in parallel.

Compare to the mapply approach in the question:

system.time({
  f3 <- function(A, B, C, D) {
    integrate(f1, a, b, A = A, B = B, C = C, D = D)$value
  }
  
  df3 <- expand.grid(A = A, B = B, C = C, D = D)
  df3 <- transform(df3, v = mapply(f3, A, B, C, D))
})
#>    user  system elapsed 
#>  150.28    0.04  150.58

all.equal(df1$v, df3$v)
#> [1] "Mean relative difference: 2.018989e-05"
like image 154
jblood94 Avatar answered Nov 17 '25 09:11

jblood94



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!