I have a problem that involves me wrapping a while loop around a bit of code that I believe can be vectorized efficiently. However, at each step, my stopping condition relies on the value at that stage. Consider this example as a representational model of my problem: Generate N(0,1) random variables using <code>rnorm()</code>until you sample a value greater than an arbitrary value, <code>k</code>. EDIT: A caveat of my problem, discussed in the comments, is that I cannot know, a priori, a good approximation of how many samples to take before my stopping condition. One approach: <ol> <li>Using a while-loop, sample suitably sized normal random vectors (for instance, <code>rnorm(50)</code> to sample 50 standard normals at a time, or <code>rnorm(1)</code> if k is close to zero). Check this vector to see if any observations are greater than k.</li> <li>If yes, stop and return all preceding values. Otherwise, combine your vector from step 1 with a new vector you make by repeating step 1.</li> </ol> Another approach would be to specify a completely overkill number of random draws for that given k. This might mean if k=2, sample 1,000 normal random variables using <code>rnorm(1000)</code>. Leveraging the vectorization that R offers in the second case gives faster results than the loop version in cases where the overkill number is not too much larger than necessary, but in my problem, I don't have a good intuition for how many runs I need to do, so I'd need to be conservative. The question follows: Is there a way to do a highly-vectorized procedure, like method 2, but using conditional checking like method 1? Is doing small vectorized operations like <code>rnorm(50)</code> the "fastest" way, when considering that the highly-vectorized method is element-per-element faster, but more wasteful?

Here is an implementation of my earlier suggestion: use your first approach but increase the number of new samples between each iteration, e.g., instead of <code>50</code> new samples at each iteration, multiply that number by two between each iteration: <code>50</code>, then <code>100</code>, <code>200</code>, <code>400</code>, etc. With your sample size following a divergent geometric series, you are guaranteed to exit in a "few" iterations. <pre class="prettyprint"><code>sample.until.thresh <- function(FUN, exit.thresh, sample.start = 50, sample.growth = 2) { sample.size <- sample.start all.values <- list() num.iterations <- 0L repeat { num.iterations <- num.iterations + 1L sample.values <- FUN(sample.size) all.values[[num.iterations]] <- sample.values above.thresh <- sample.values > exit.thresh if (any(above.thresh)) { first.above <- match(TRUE, above.thresh) all.values[[num.iterations]] <- sample.values[1:first.above] break } sample.size <- sample.size * sample.growth } all.values <- unlist(all.values) return(list(num.iterations = num.iterations, sample.size = length(all.values), sample.values = all.values)) } set.seed(123456L) res <- sample.until.thresh(rnorm, 5) res$num.iterations # [1] 16 res$sample.size # [1] 2747703 </code></pre>

Is there a more efficient method than while loops for something that requires conditional checking?

Tags:

loops

r

I have a problem that involves me wrapping a while loop around a bit of code that I believe can be vectorized efficiently. However, at each step, my stopping condition relies on the value at that stage. Consider this example as a representational model of my problem:
Generate N(0,1) random variables using rnorm()until you sample a value greater than an arbitrary value, k.

EDIT: A caveat of my problem, discussed in the comments, is that I cannot know, a priori, a good approximation of how many samples to take before my stopping condition.

One approach:

Using a while-loop, sample suitably sized normal random vectors (for instance, rnorm(50) to sample 50 standard normals at a time, or rnorm(1) if k is close to zero). Check this vector to see if any observations are greater than k.
If yes, stop and return all preceding values. Otherwise, combine your vector from step 1 with a new vector you make by repeating step 1.

Another approach would be to specify a completely overkill number of random draws for that given k. This might mean if k=2, sample 1,000 normal random variables using rnorm(1000).

Leveraging the vectorization that R offers in the second case gives faster results than the loop version in cases where the overkill number is not too much larger than necessary, but in my problem, I don't have a good intuition for how many runs I need to do, so I'd need to be conservative.

The question follows: Is there a way to do a highly-vectorized procedure, like method 2, but using conditional checking like method 1? Is doing small vectorized operations like rnorm(50) the "fastest" way, when considering that the highly-vectorized method is element-per-element faster, but more wasteful?

997

asked Apr 20 '12 18:04

Christopher Aden

1 Answers

Here is an implementation of my earlier suggestion: use your first approach but increase the number of new samples between each iteration, e.g., instead of 50 new samples at each iteration, multiply that number by two between each iteration: 50, then 100, 200, 400, etc.

With your sample size following a divergent geometric series, you are guaranteed to exit in a "few" iterations.

sample.until.thresh <- function(FUN, exit.thresh,
                                sample.start = 50,
                                sample.growth = 2) {

   sample.size    <- sample.start
   all.values     <- list()
   num.iterations <- 0L

   repeat {
      num.iterations <- num.iterations + 1L
      sample.values  <- FUN(sample.size)
      all.values[[num.iterations]] <- sample.values

      above.thresh <- sample.values > exit.thresh
      if (any(above.thresh)) {
         first.above <- match(TRUE, above.thresh)
         all.values[[num.iterations]] <- sample.values[1:first.above]
         break
      }

      sample.size <- sample.size * sample.growth
   }

   all.values <- unlist(all.values)

   return(list(num.iterations = num.iterations,
               sample.size    = length(all.values),
               sample.values  = all.values))
}

set.seed(123456L)
res <- sample.until.thresh(rnorm, 5)
res$num.iterations
# [1] 16
res$sample.size
# [1] 2747703

188

answered Oct 05 '22 01:10

flodel

Related questions
                            
                                Rmarkdown add footnote to figure caption
                            
                                Reading a password-protected xlsx-file into R without installing Java (password is known)
                            
                                How to use tryCatch with withTimeout to timeout Rcpp function without stopping execution of script
                            
                                Merge multiple data frames - Error in match.names(clabs, names(xi)) : names do not match previous names
                            
                                R: plot in a grid layout over multiple pages
                            
                                Convert units from npc to native using grid in R
                            
                                Generating 3D plots in R with arbitrary data points and viewpoints
                            
                                Multiple frequency lines on same graph where y is a character value
                            
                                Issue with quantmod add_TA and chart_Series - lines and text disappear after next add_TA is called
                            
                                How to pick up the information for the nearest associated polygon to points using R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With