Does anyone know how the replicate() function works in R and how efficient it is relative to using a for loop?
For example, is there any efficiency difference between...
means <- replicate(100000, mean(rnorm(50)))
And...
means <- c()
for(i in 1:100000) {
means <- c(means, mean(rnorm(50)))
}
(I may have typed something slightly off above, but you get the idea.)
You can just benchmark the code and get your answer empirically. Note that I also added a second for loop flavor which circumvents the growing vector problem by preallocating the vector.
repl_function = function(no_rep) means <- replicate(no_rep, mean(rnorm(50)))
for_loop = function(no_rep) {
means <- c()
for(i in 1:no_rep) {
means <- c(means, mean(rnorm(50)))
}
means
}
for_loop_prealloc = function(no_rep) {
means <- vector(mode = "numeric", length = no_rep)
for(i in 1:no_rep) {
means[i] <- mean(rnorm(50))
}
means
}
no_loops = 50e3
benchmark(repl_function(no_loops),
for_loop(no_loops),
for_loop_prealloc(no_loops),
replications = 3)
test replications elapsed relative user.self sys.self
2 for_loop(no_loops) 3 18.886 6.274 17.803 0.894
3 for_loop_prealloc(no_loops) 3 3.209 1.066 3.189 0.000
1 repl_function(no_loops) 3 3.010 1.000 2.997 0.000
user.child sys.child
2 0 0
3 0 0
1 0 0
Looking at the relative
column, the un-preallocated for loop is 6.2 times slower. However, the preallocated for loop is just as fast as replicate
.
replicate
is a wrapper for sapply
, which itself is a wrapper for lapply
. lapply
is ultimately an .Internal
function that is written in C and performs the looping in an optimised way, rather than through the interpreter. It's main advantages are efficient memory management, especially compared to the highly inefficient vector growing method you present above.
I have a very different experience with replicate
which also confuses me. It often happens that my R crashes and my laptop hangs when I use replicate
compared to for
and this surprises me, as for the reasons mentioned above, I also expected a C-written function to outperform the for
loop. For example, if you execute the functions below, you'll see that for
loop is faster than replicate
system.time(for (i in 1:10) runif(1e7))
# user system elapsed
# 3.340 0.218 3.558
system.time(replicate(10, runif(1e7)))
# user system elapsed
# 4.622 0.484 5.109
so with 10
replicates, the for
loop is clearly faster. If you repeat it for 100 replicates you get similar results. So I wonder if anyone can come with an example that shows its practical privileges compared to for
.
PS I also created a function for the runif(1e7)
and that made no difference in the comparison. Basically I failed to come with any example that shows the advantage of replicate
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With