I was benchmarking the sample
function in R and comparing it with igraph:sample_seq
and ran into a strange result.
When I run something like:
library(microbenchmark)
library(igraph)
set.seed(1234)
N <- 55^4
M <- 500
(mbm <- microbenchmark(v1 = {sample(N,M)},
v2 = {igraph::sample_seq(1,N,M)}, times=50))
I get a result like this:
Unit: microseconds
expr min lq mean median uq max neval
v1 21551.475 22655.996 26966.22166 23748.2555 28340.974 47566.237 50
v2 32.873 37.952 82.85238 81.7675 96.141 358.277 50
But when I run, for example,
set.seed(1234)
N <- 100^4
M <- 500
(mbm <- microbenchmark(v1 = {sample(N,M)},
v2 = {igraph::sample_seq(1,N,M)}, times=50))
I get a much faster result for sample
:
Unit: microseconds
expr min lq mean median uq max neval
v1 52.165 55.636 64.70412 58.2395 78.636 88.120 50
v2 39.174 43.504 62.09600 53.5715 73.253 176.419 50
It seems that when N
is a power of 10 (or some other special number?), sample
is much faster than other smaller N
that are not powers of 10. Is this expected behavior or am I missing something?
The microbenchmark package is useful for running small sections of code to assess performance, as well as for comparing the speed of several functions that do the same thing.
sample()
or rather sample.int()
by default uses a hash algorithm when certain conditions are met, one being that n > 1e7.
If the second benchmark is rerun without hashing you'll see that it is also much slower than than the igraph function.
set.seed(1234)
N2 <- 100^4
M <- 500
(mbm <- microbenchmark(v1 = {sample.int(N2,M, useHash = FALSE)},
v2 = {igraph::sample_seq(1,N2,M)}, times=50))
Unit: microseconds
expr min lq mean median uq max neval cld
v1 144297.936 150368.649 167224.95664 154283.077 157832.520 407710.78 50 b
v2 61.218 65.392 92.35544 87.885 118.262 148.87 50 a
From the documentation for the useHash
argument:
logical indicating if the hash-version of the algorithm should be used. Can only be used for replace = FALSE, prob = NULL, and size <= n/2, and really should be used for large n, as useHash=FALSE will use memory proportional to n.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With