Is there a faster method for taking a random sub sample (without replacement), than the base::sample
function?
you can get a little bit of a speed-up by eliminating the base::sample function call:
> x<- rnorm(10000)
> system.time(for(i in 1:100000) x[.Internal(sample(10000L, 10L, FALSE, NULL))])
user system elapsed
2.873 0.017 2.851
> system.time(for(i in 1:100000) sample(x,10))
user system elapsed
3.420 0.025 3.258
Depending on your problem there may be other more clever ways of speeding up your code. Think about ways to replace many small calls to sample with one big one.
no
I can get 10,000 samples in 3 ms on my laptop with replacement. Without replacement I can get them in 5ms. Drawing multiple times from 500 distributions it takes 66 ms. How fast did you need it to be?
The dqrng
package tackles faster sampling in R.
Here is one example & benchmark:
library(dqrng)
library(bench)
m <- 1000
n <- 99999
all <- m * n
bm <- bench::mark(samp = sample(x = c(1, -1), size = all, replace = TRUE),
dqsamp = dqsample(x = c(1,-1), size = all, replace = TRUE),
check = FALSE,
iterations = 3)
bm
# # A tibble: 2 x 13
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
# 1 samp 6.37s 6.59s 0.153 1.12GB 0.153 3 3 19.56s
# 2 dqsamp 1.07s 1.43s 0.723 1.12GB 0.482 3 2 4.15s
# # ... with 4 more variables: result <list>, memory <list>, time <list>, gc <list>
Here is a related blogpost: https://www.r-bloggers.com/2019/04/fast-sampling-support-in-dqrng/.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With