Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Large loops hang in R?

Tags:

for-loop

r

Suppose I want perform a simulation using the following function:

fn1 <- function(N) {
  res <- c()
  for (i in 1:N) {
    x <- rnorm(2)
    res <- c(res, x[2]-x[1])
  }
  res
}

For very large N, computation appears to hang. Are there better ways of doing this?

(Inspired by: https://stat.ethz.ch/pipermail/r-help/2008-February/155591.html)

like image 443
Christopher DuBois Avatar asked Jul 23 '09 04:07

Christopher DuBois


People also ask

Why are loops so slow in R?

Loops are slower in R than in C++ because R is an interpreted language (not compiled), even if now there is just-in-time (JIT) compilation in R (>= 3.4) that makes R loops faster (yet, still not as fast). Then, R loops are not that bad if you don't use too many iterations (let's say not more than 100,000 iterations).

Should FOR loops be avoided in R?

A FOR loop is the most intuitive way to apply an operation to a series by looping through each item one by one, which makes perfect sense logically but should be avoided by useRs given the low efficiency.

How do I stop a loop from running in R?

The R Break statement is very useful to exit from any loop such as For, While, and Repeat. While executing these, if R finds the break statement inside them, it will stop executing the code and immediately exit from the loop.


1 Answers

The efficiency of loops can be increased tremendously in R through the use of the apply functions which essentially process whole vectors of data at once rather than looping through them. For the loop shown above, there are two basic operations happening during each iteration:

# A vector of two random numbers is generated
x <- rnorm( 2 )

# The difference between those numbers is calculated
x[2] - x[1]

In this case the appropriate function would be sapply(). sapply() operates on a list of objects, such as the vector generated by the loop statement 1:N and returns a vector of results:

sapply( 1:N, function( i ){ x <- rnorm(2); return( x[2] - x[1] ) } )

Note that the index value i is available during the function call and successively takes on the values between 1 and N, however it is not needed in this case.

Getting into the habit of recognizing where apply can be used over for is a very valuable skill- many R libraries for parallel computation provide plug-and-play parallelization through apply functions. Using apply can often allow access to significant performance increases on multicore systems with zero refactoring of code.

like image 142
Sharpie Avatar answered Sep 22 '22 00:09

Sharpie