I have a problem that involves me wrapping a while loop around a bit of code that I believe can be vectorized efficiently. However, at each step, my stopping condition relies on the value at that stage. Consider this example as a representational model of my problem:
Generate N(0,1) random variables using rnorm()
until you sample a value greater than an arbitrary value, k
.
EDIT: A caveat of my problem, discussed in the comments, is that I cannot know, a priori, a good approximation of how many samples to take before my stopping condition.
One approach:
Using a while-loop, sample suitably sized normal random vectors (for instance, rnorm(50)
to sample 50 standard normals at a time, or rnorm(1)
if k is close to zero). Check this vector to see if any observations are greater than k.
If yes, stop and return all preceding values. Otherwise, combine your vector from step 1 with a new vector you make by repeating step 1.
Another approach would be to specify a completely overkill number of random draws for that given k. This might mean if k=2, sample 1,000 normal random variables using rnorm(1000)
.
Leveraging the vectorization that R offers in the second case gives faster results than the loop version in cases where the overkill number is not too much larger than necessary, but in my problem, I don't have a good intuition for how many runs I need to do, so I'd need to be conservative.
The question follows: Is there a way to do a highly-vectorized procedure, like method 2, but using conditional checking like method 1? Is doing small vectorized operations like rnorm(50)
the "fastest" way, when considering that the highly-vectorized method is element-per-element faster, but more wasteful?
It turns out that Repeat is actually quite a bit more efficient than While, demonstrated below. Repeat may have the convenience that in many situations, the condition is not known or even defined until inside the loop.
Simply, when you want to check condition before and then perform operation while is better option, and if you want to perform operation at least once and then check the condition do-while is better.
All for loops can be written as while loops, and vice-versa. Just use whichever loop seems more appropriate to the task at hand. In general, you should use a for loop when you know how many times the loop should run.
The for places the initial condition, increment, and exit condition all in one place, making it easier to understand. The while loop spreads them around. For example, in your sample, what is the initial value of i? -oh, you forgot to specify it? --that's the point.
Here is an implementation of my earlier suggestion: use your first approach but increase the number of new samples between each iteration, e.g., instead of 50
new samples at each iteration, multiply that number by two between each iteration: 50
, then 100
, 200
, 400
, etc.
With your sample size following a divergent geometric series, you are guaranteed to exit in a "few" iterations.
sample.until.thresh <- function(FUN, exit.thresh,
sample.start = 50,
sample.growth = 2) {
sample.size <- sample.start
all.values <- list()
num.iterations <- 0L
repeat {
num.iterations <- num.iterations + 1L
sample.values <- FUN(sample.size)
all.values[[num.iterations]] <- sample.values
above.thresh <- sample.values > exit.thresh
if (any(above.thresh)) {
first.above <- match(TRUE, above.thresh)
all.values[[num.iterations]] <- sample.values[1:first.above]
break
}
sample.size <- sample.size * sample.growth
}
all.values <- unlist(all.values)
return(list(num.iterations = num.iterations,
sample.size = length(all.values),
sample.values = all.values))
}
set.seed(123456L)
res <- sample.until.thresh(rnorm, 5)
res$num.iterations
# [1] 16
res$sample.size
# [1] 2747703
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With