I wrote the following two functions, one to test increasing vector size using concatenate, and one using brackets:
c_test <- function(n) {
cv = c()
for(i in 1:n) cv = c(cv, i)
cv
}
b_test <- function(n) {
bv = c()
for (i in 1:n) bv[i] = i
bv
}
library(microbenchmark)
microbenchmark(c_test(1e+4), b_test(1e+4), times = 100)
#Unit: milliseconds
# expr min lq mean median uq max neval
# c_test(10000) 140.27923 145.73282 156.82319 148.16175 151.74713 267.2393 100
# b_test(10000) 49.58033 54.42992 56.24268 54.86033 56.30862 132.8394 100
This is a big time difference, and I don't understand why using brackets is so much better than using concatenate. It seems like in both cases it would be taking time to allocate new memory, but this doesn't seem true. I also thought it might be that c(v, x)
is converting up x
to be the same type as v
before merging, but saying v[i] = as.vector(x)
isn't a significant time cost.
This should probably be a comment as I don't know the actual answer, but it was getting too long.
"c" and "[" are both primitive, internal and generic. This means method dispatch is done by C functions, and that's as far as I can get in answering your actual question. Something mysterious is going on there, and "[" is more efficient than "c" in this particular respect.
However, I did want to point out that, as per some but not all of the comments, both these methods are inefficient and not just on account of vectorisation. Preallocating memory space for the size of the vector you expect does help very materially, much more so than the difference between c
and [
. Preallocation gives you a 70% to 90% speed up from the [
version:
# very poor - repeated calls to c() to extend
c_test <- function(n) {
cv = c()
for(i in 1:n) cv = c(cv, i)
cv
}
# slightly better - just use []
b_test <- function(n) {
bv = c()
for (i in 1:n) bv[i] = i
bv
}
# much better practice - preallocate length of the vector
d_test <- function(n) {
bv = numeric(n)
for (i in 1:n) bv[i] = i
bv
}
# good practice if possible - vectorisation
e_test <- function(n) {
bv = 1:n
bv
}
library(microbenchmark)
microbenchmark(c_test(1e+4), b_test(1e+4), d_test(1e+4), e_test(1e+4), times = 100)
This gives:
Unit: microseconds
expr min lq mean median uq max neval cld
c_test(10000) 102355.753 111202.568 129250.53638 114237.234 132468.938 220005.926 100 c
b_test(10000) 47337.481 52820.938 77029.01728 59450.864 116529.185 192643.555 100 b
d_test(10000) 6761.877 7492.741 7965.37288 7814.519 8353.778 11007.605 100 a
e_test(10000) 3.555 6.321 9.32347 8.692 10.272 27.259 100 a
Also, as @Roland said "growing an object gets more expensive with increasing size". As the vector gets bigger there are less convenient spots available in memory to put it.
I appreciate that the e_test
(vectorised) isn't applicable in your Fibonacci use case but left it in for comparison anyway, to give an idea of the scale of speed up when vectorisation is possible.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With