Consider the following examples:
> start<-Sys.time()
> for(i in 1:10000){}
> Sys.time()-start
Time difference of 0.01399994 secs
>
> fn<-function(){
+ start<-Sys.time()
+ for(i in 1:10000){}
+ Sys.time()-start
+ }
> fn()
Time difference of 0.00199604 secs
start<-Sys.time()
for(i in 1:10000){x<-100}
Sys.time()-start
Time difference of 0.012995 secs
fn<-function(){
start<-Sys.time()
for(i in 1:10000){x<-100}
Sys.time()-start
}
fn()
Time difference of 0.008996964 secs
The result is the same after increasing number of iterations as shown below:
> sim<-10000000
> start<-Sys.time()
> for(i in 1:sim){x<-i}
> Sys.time()-start
Time difference of 2.832 secs
>
> fn<-function(){
+ start<-Sys.time()
+ for(i in 1:sim){x<-i}
+ Sys.time()-start
+ }
> fn()
Time difference of 2.017997 secs
I am guessing this is not a coincidence! Why does R code run faster in a function?
Functions in R are compiled by the JIT compiler. After this happens, most functions will be faster.
As the docs in ?compiler::enableJIT say,
JIT is disabled if the argument is 0. If level is 1 then larger closures are compiled before their first use. If level is 2, then some small closures are also compiled before their second use. If level is 3 then in addition all top level loops are compiled before they are executed. JIT level 3 requires the compiler option optimize to be 2 or 3. The JIT level can also be selected by starting R with the environment variable R_ENABLE_JIT set to one of these values. Calling enableJIT with a negative argument returns the current JIT level. The default JIT level is 3.
So many functions will be faster than top level code.
To prove the JIT-impact I have used this benchmark:
library(microbenchmark)
compiler::enableJIT(0) # use 3 for testing with full JIT compiler
fn <- function() {
for(i in 1:10000) {}
}
microbenchmark(for_loop_without_func = for(i in 1:10000) {},
for_loop_in_func = fn(),
times = 100)
# Run eg. with (to avoid RStudio or other overhead):
# R --vanilla < jit_test.R
The result shows that with disabled JIT the execution time is nearly the same:
Unit: microseconds
expr min lq mean median uq max neval
for_loop_without_func 180.619 180.7990 182.7129 180.9290 181.050 239.489 100
for_loop_in_func 182.582 182.7075 186.2232 182.7625 182.938 309.912 100
With compiler::enableJIT(3) (which is the default) the function is faster:
Unit: microseconds
expr min lq mean median uq max neval
for_loop_without_func 558.727 574.4875 659.21931 657.3425 702.6475 1984.351 100
for_loop_in_func 53.019 53.4955 61.59588 53.7260 54.0320 790.632 100
Interestingly enabling JIT seems to slow down the code running outside of the function (compared to the first "no JIT" benchmark) even though it will not be optimized. Would be interesting to understand why (perhaps JIT needs time to find out which code it will not optimize)?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With