Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is running R code inside a function faster?

Tags:

r

Consider the following examples:

> start<-Sys.time()
> for(i in 1:10000){}
> Sys.time()-start
Time difference of 0.01399994 secs
> 
> fn<-function(){
+   start<-Sys.time()
+   for(i in 1:10000){}
+   Sys.time()-start
+ }
> fn()
Time difference of 0.00199604 secs



start<-Sys.time()
for(i in 1:10000){x<-100}
Sys.time()-start
Time difference of 0.012995 secs
fn<-function(){
  start<-Sys.time()
  for(i in 1:10000){x<-100}
  Sys.time()-start
}
fn()
Time difference of 0.008996964 secs

The result is the same after increasing number of iterations as shown below:

> sim<-10000000
> start<-Sys.time()
> for(i in 1:sim){x<-i}
> Sys.time()-start
Time difference of 2.832 secs
> 
> fn<-function(){
+   start<-Sys.time()
+   for(i in 1:sim){x<-i}
+   Sys.time()-start
+ }
> fn()
Time difference of 2.017997 secs

I am guessing this is not a coincidence! Why does R code run faster in a function?

like image 696
Masoud Avatar asked Apr 28 '26 01:04

Masoud


2 Answers

Functions in R are compiled by the JIT compiler. After this happens, most functions will be faster.

As the docs in ?compiler::enableJIT say,

JIT is disabled if the argument is 0. If level is 1 then larger closures are compiled before their first use. If level is 2, then some small closures are also compiled before their second use. If level is 3 then in addition all top level loops are compiled before they are executed. JIT level 3 requires the compiler option optimize to be 2 or 3. The JIT level can also be selected by starting R with the environment variable R_ENABLE_JIT set to one of these values. Calling enableJIT with a negative argument returns the current JIT level. The default JIT level is 3.

So many functions will be faster than top level code.

To prove the JIT-impact I have used this benchmark:

library(microbenchmark)

compiler::enableJIT(0)  # use 3 for testing with full JIT compiler

fn <- function() {
   for(i in 1:10000) {}
}

microbenchmark(for_loop_without_func = for(i in 1:10000) {},
               for_loop_in_func = fn(),
               times = 100)

# Run eg. with (to avoid RStudio or other overhead):
# R --vanilla < jit_test.R

The result shows that with disabled JIT the execution time is nearly the same:

Unit: microseconds
                  expr     min       lq     mean   median      uq     max neval
 for_loop_without_func 180.619 180.7990 182.7129 180.9290 181.050 239.489   100
      for_loop_in_func 182.582 182.7075 186.2232 182.7625 182.938 309.912   100

With compiler::enableJIT(3) (which is the default) the function is faster:

Unit: microseconds
                  expr     min       lq      mean   median       uq      max neval
 for_loop_without_func 558.727 574.4875 659.21931 657.3425 702.6475 1984.351   100
      for_loop_in_func  53.019  53.4955  61.59588  53.7260  54.0320  790.632   100

Interestingly enabling JIT seems to slow down the code running outside of the function (compared to the first "no JIT" benchmark) even though it will not be optimized. Would be interesting to understand why (perhaps JIT needs time to find out which code it will not optimize)?

like image 43
R Yoda Avatar answered May 05 '26 07:05

R Yoda



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!