Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is R for loop 10 times slower than when using foreach?

This is really blowing my mind. The basic loop takes like 8 seconds on my computer:

system.time({
x <- 0
for (p in 1:2) {
    for (i in 1:500) {
        for (j in 1:5000) {
            x <- x + i * j
        }
    }
}
})
x

Whereas if I use foreach in non-parallel mode, it does take only 0.7 secs!!!

system.time({
x <- 0
foreach(p = 1:2, .combine = rbind) %do% 
    for (i in 1:500) {
        for (j in 1:5000) {
            x <- x + i * j
        }
    }
})
x

The result is the same, but foreach was somehow able to reach it much faster than basic R! Where is the inefficiency of basic R?

How is this possible?

In fact, I got complete opposite result compared to this one: Why is foreach() %do% sometimes slower than for?

like image 472
Tomas Avatar asked Jul 09 '14 10:07

Tomas


People also ask

WHY IS FOR loop so slow in R?

Loops are slower in R than in C++ because R is an interpreted language (not compiled), even if now there is just-in-time (JIT) compilation in R (>= 3.4) that makes R loops faster (yet, still not as fast). Then, R loops are not that bad if you don't use too many iterations (let's say not more than 100,000 iterations).

Is foreach loop faster than for loop?

The forloop is faster than the foreach loop if the array must only be accessed once per iteration.

Are foreach loops slower?

Foreach performance is approximately 6 times slower than FOR / FOREACH performance. The FOR loop without length caching works 3 times slower on lists, comparing to arrays. The FOR loop with length caching works 2 times slower on lists, comparing to arrays.

Why are loops slower?

The last point is well covered on SO, for example in this Answer, and applies if the code involved in setting up and operating the loop is a significant part of the overall computational burden of the loop. Why many people think for() loops are slow is because they, the user, are writing bad code.

Why are loops so slow in R?

Loops are specially slow in R. If you run or plan to run computationally expensive tasks, you must pre-allocate memory. This technique consists on reserving space for the objects you are creating or filling inside a loop. Let’s see an example:

Does for_each () function make loops faster?

This function can make your loops faster, but it could depend on your loop. In the following example we created a function named for_each where we executed the square root of the corresponding value of each iteration.

How do for-loops work in R?

It is very important to understand that for-loops in R do not iterate over regular sequences, but over a collection of objects. For that reason, we are able to loop through vectors of character strings.

Is there a way to iterate over a row in R?

You should also take a look at package {purrr} that provides shortcuts, consistency and some functions to iterate over rows of a data frame. Loops are slower in R than in C++ because R is an interpreted language (not compiled), even if now there is just-in-time (JIT) compilation in R (>= 3.4) that makes R loops faster (yet, still not as fast).


Video Answer


1 Answers

foreach when used sequentially eventually uses compiler to produce compiled byte code using the non-exported functions make.codeBuf and cmp. You can use cmpfun to compile the innerloop into bytecode to simulate this and achieve a similar speedup.

f.original <- function() {
x <- 0
for (p in 1:2) {
    for (i in 1:500) {
        for (j in 1:5000) {
            x <- x + i * j
        }
    }
}
x
}

f.foreach <- function() {
x <- 0
foreach(p = 1:2, .combine = rbind) %do% 
    for (i in 1:500) {
        for (j in 1:5000) {
            x <- x + i * j
        }
    }
x
}

f.cmpfun <- function(x) {
f <- cmpfun(function(x) {
    for (i in 1:500) {
        for (j in 1:5000) {
            x <- x + i * j
            }
        }
        x
    })
    f(f(0))
}

Results

library(microbenchmark)
microbenchmark(f.original(),f.foreach(),f.cmpfun(), times=5)
Unit: milliseconds
         expr       min        lq    median        uq       max neval
 f.original() 4033.6114 4051.5422 4061.7211 4072.6700 4079.0338     5
  f.foreach()  426.0977  429.6853  434.0246  437.0178  447.9809     5
   f.cmpfun()  418.2016  427.9036  441.7873  444.1142  444.4260     5
all.equal(f.original(),f.foreach(),f.cmpfun())
[1] TRUE
like image 159
James Avatar answered Sep 19 '22 01:09

James