Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

do.call 20% slower than a normal call in R?

I am not sure if I use the do.call the right way:

test <- function(test) {
  string <- deparse(substitute(test))
  start <- regexpr("\\(", string)
  end <- regexpr(")", string) - 1
  distribution <- substr(string, 0, start-1)
  string.arguments <- substr(string, start+1, end)
  v <- read.table(text=unlist(strsplit(string.arguments, ",")))
  list.arguments <- lapply(t(v), function(x) x)

  for (i in 1:1000000) {
    do.call(distribution, list.arguments)
  } 
}

The goal here is to be able to send a distribution, such as rnorm and rgamma, followed by arguments to a function, instead of an evaluated function.

Here is a comparison of using do.call and just simply calling the function:

> system.time(test(rnorm(100, 1, 10))) 
   user  system elapsed 
   17.772   0.000  17.820 
> system.time(for(i in 1:1000000) { rnorm(100,0,1)} )
   user  system elapsed 
   13.940   0.004  14.015 

The question is twofold:

  • Does do.call really have to take 20% longer?
  • Is this the right approach to accept varying distributions and arguments?
like image 465
PascalVKooten Avatar asked Dec 27 '22 00:12

PascalVKooten


1 Answers

do.call is always going to be slower than calling a function directly, because it has to go through your arguments and find the function before calling it. The degree to which it's slower depends on how much additional computation it has in which to amortise this overhead.

> system.time(for(i in 1:1e6) do.call(rnorm, list(100)))
   user  system elapsed 
  13.55    0.00   13.58 
> system.time(for(i in 1:1e6) rnorm(100))
   user  system elapsed 
  11.40    0.00   11.42 

whereas:

> system.time(for(i in 1:1e2) do.call(rnorm, list(1e6)))
   user  system elapsed 
   9.14    0.00    9.15 
> system.time(for(i in 1:1e2) rnorm(1e6))
   user  system elapsed 
   9.14    0.00    9.14 

In addition, some of your slowdown is due to your regex'ing and other string manipulation which is unrelated to how fast do.call inherently is. While fast because it's running on trivially small input, it's still needlessly complicated. Why not just do this:

test <- function(distrib, ..., N=1e6)
lapply(seq(N), function(x) distrib(...))

test(rnorm, 100, 1, 10)

or this:

test <- function(call, N=1e6)
{
    call <- substitute(call)
    lapply(seq(N), function(...) eval.parent(call))
}

test(rnorm(100, 1, 10))
like image 167
Hong Ooi Avatar answered Jan 08 '23 19:01

Hong Ooi