This is really blowing my mind. The basic loop takes like 8 seconds on my computer:
system.time({
x <- 0
for (p in 1:2) {
for (i in 1:500) {
for (j in 1:5000) {
x <- x + i * j
}
}
}
})
x
Whereas if I use foreach
in non-parallel mode, it does take only 0.7 secs!!!
system.time({
x <- 0
foreach(p = 1:2, .combine = rbind) %do%
for (i in 1:500) {
for (j in 1:5000) {
x <- x + i * j
}
}
})
x
The result is the same, but foreach
was somehow able to reach it much faster than basic R! Where is the inefficiency of basic R?
In fact, I got complete opposite result compared to this one: Why is foreach() %do% sometimes slower than for?
Loops are slower in R than in C++ because R is an interpreted language (not compiled), even if now there is just-in-time (JIT) compilation in R (>= 3.4) that makes R loops faster (yet, still not as fast). Then, R loops are not that bad if you don't use too many iterations (let's say not more than 100,000 iterations).
The forloop is faster than the foreach loop if the array must only be accessed once per iteration.
Foreach performance is approximately 6 times slower than FOR / FOREACH performance. The FOR loop without length caching works 3 times slower on lists, comparing to arrays. The FOR loop with length caching works 2 times slower on lists, comparing to arrays.
The last point is well covered on SO, for example in this Answer, and applies if the code involved in setting up and operating the loop is a significant part of the overall computational burden of the loop. Why many people think for() loops are slow is because they, the user, are writing bad code.
Loops are specially slow in R. If you run or plan to run computationally expensive tasks, you must pre-allocate memory. This technique consists on reserving space for the objects you are creating or filling inside a loop. Let’s see an example:
This function can make your loops faster, but it could depend on your loop. In the following example we created a function named for_each where we executed the square root of the corresponding value of each iteration.
It is very important to understand that for-loops in R do not iterate over regular sequences, but over a collection of objects. For that reason, we are able to loop through vectors of character strings.
You should also take a look at package {purrr} that provides shortcuts, consistency and some functions to iterate over rows of a data frame. Loops are slower in R than in C++ because R is an interpreted language (not compiled), even if now there is just-in-time (JIT) compilation in R (>= 3.4) that makes R loops faster (yet, still not as fast).
foreach
when used sequentially eventually uses compiler
to produce compiled byte code using the non-exported functions make.codeBuf
and cmp
. You can use cmpfun
to compile the innerloop into bytecode to simulate this and achieve a similar speedup.
f.original <- function() {
x <- 0
for (p in 1:2) {
for (i in 1:500) {
for (j in 1:5000) {
x <- x + i * j
}
}
}
x
}
f.foreach <- function() {
x <- 0
foreach(p = 1:2, .combine = rbind) %do%
for (i in 1:500) {
for (j in 1:5000) {
x <- x + i * j
}
}
x
}
f.cmpfun <- function(x) {
f <- cmpfun(function(x) {
for (i in 1:500) {
for (j in 1:5000) {
x <- x + i * j
}
}
x
})
f(f(0))
}
Results
library(microbenchmark)
microbenchmark(f.original(),f.foreach(),f.cmpfun(), times=5)
Unit: milliseconds
expr min lq median uq max neval
f.original() 4033.6114 4051.5422 4061.7211 4072.6700 4079.0338 5
f.foreach() 426.0977 429.6853 434.0246 437.0178 447.9809 5
f.cmpfun() 418.2016 427.9036 441.7873 444.1142 444.4260 5
all.equal(f.original(),f.foreach(),f.cmpfun())
[1] TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With