Suppose I want perform a simulation using the following <code>function</code>: <pre class="prettyprint"><code>fn1 <- function(N) { res <- c() for (i in 1:N) { x <- rnorm(2) res <- c(res, x[2]-x[1]) } res } </code></pre> For very large <code>N</code>, computation appears to hang. Are there better ways of doing this? (Inspired by: https://stat.ethz.ch/pipermail/r-help/2008-February/155591.html)

The efficiency of loops can be increased tremendously in R through the use of the apply functions which essentially process whole vectors of data at once rather than looping through them. For the loop shown above, there are two basic operations happening during each iteration: <pre class="prettyprint"><code># A vector of two random numbers is generated x <- rnorm( 2 ) # The difference between those numbers is calculated x[2] - x[1] </code></pre> In this case the appropriate function would be <code>sapply()</code>. <code>sapply()</code> operates on a list of objects, such as the vector generated by the loop statement <code>1:N</code> and returns a vector of results: <pre class="prettyprint"><code>sapply( 1:N, function( i ){ x <- rnorm(2); return( x[2] - x[1] ) } ) </code></pre> Note that the index value <code>i</code> is available during the function call and successively takes on the values between <code>1</code> and <code>N</code>, however it is not needed in this case. Getting into the habit of recognizing where <code>apply</code> can be used over <code>for</code> is a very valuable skill- many R libraries for parallel computation provide plug-and-play parallelization through <code>apply</code> functions. Using <code>apply</code> can often allow access to significant performance increases on multicore systems with zero refactoring of code.

Large loops hang in R?

Tags:

for-loop

r

Suppose I want perform a simulation using the following function:

fn1 <- function(N) {
  res <- c()
  for (i in 1:N) {
    x <- rnorm(2)
    res <- c(res, x[2]-x[1])
  }
  res
}

For very large N, computation appears to hang. Are there better ways of doing this?

(Inspired by: https://stat.ethz.ch/pipermail/r-help/2008-February/155591.html)

443

asked Jul 23 '09 04:07

Christopher DuBois

1 Answers

The efficiency of loops can be increased tremendously in R through the use of the apply functions which essentially process whole vectors of data at once rather than looping through them. For the loop shown above, there are two basic operations happening during each iteration:

# A vector of two random numbers is generated
x <- rnorm( 2 )

# The difference between those numbers is calculated
x[2] - x[1]

In this case the appropriate function would be sapply(). sapply() operates on a list of objects, such as the vector generated by the loop statement 1:N and returns a vector of results:

sapply( 1:N, function( i ){ x <- rnorm(2); return( x[2] - x[1] ) } )

Note that the index value i is available during the function call and successively takes on the values between 1 and N, however it is not needed in this case.

Getting into the habit of recognizing where apply can be used over for is a very valuable skill- many R libraries for parallel computation provide plug-and-play parallelization through apply functions. Using apply can often allow access to significant performance increases on multicore systems with zero refactoring of code.

142

answered Sep 22 '22 00:09

Sharpie

Related questions
                            
                                how to remove the negative values from a data frame in R
                            
                                MXNet package installation in R
                            
                                Error in installing packages 'RGtk2' and 'rattle' in R
                            
                                trimws bug? leading whitespace not removed
                            
                                How would you fit a gamma distribution to a data in R?
                            
                                Does PostgreSQL numeric type support infinity (and -infinity)?
                            
                                Why doesn't restarting R with Ctrl-Shift-F10 clear my environment variables?
                            
                                Looping over multiple lists with base R
                            
                                Extract substring and numbers from a string in R
                            
                                Loop to add new columns with ifelse
                            
                                Logistic Regression on factor: Error in eval(family$initialize) : y values must be 0 <= y <= 1
                            
                                How to extend the 'summary' function to include sd, kurtosis and skew?
                            
                                Creating a waffle plot together with facets in ggplot2
                            
                                when trying to install rgeos R cannot find -lgeos
                            
                                plot circle segment defined by three points with ggplot2
                            
                                Recoding a semicolon separated list in R
                            
                                Using case_when with dplyr across
                            
                                Arrange data frame columns by class: numeric before character
                            
                                Using the dplyr library in R to "print" the name of the non-NA columns
                            
                                A regex to remove the pattern "[0-9]g"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With