Logo Questions Linux Laravel Mysql Ubuntu Git Menu

fast sampling in R



Is there a faster method for taking a random sub sample (without replacement), than the base::sample function?

like image 957
user680111 Avatar asked Mar 28 '11 11:03


3 Answers

you can get a little bit of a speed-up by eliminating the base::sample function call:

> x<- rnorm(10000)
> system.time(for(i in 1:100000) x[.Internal(sample(10000L, 10L, FALSE, NULL))])
   user  system elapsed 
  2.873   0.017   2.851 
> system.time(for(i in 1:100000) sample(x,10))
   user  system elapsed 
  3.420   0.025   3.258 

Depending on your problem there may be other more clever ways of speeding up your code. Think about ways to replace many small calls to sample with one big one.

like image 173
Ian Fellows Avatar answered Nov 11 '22 23:11

Ian Fellows


I can get 10,000 samples in 3 ms on my laptop with replacement. Without replacement I can get them in 5ms. Drawing multiple times from 500 distributions it takes 66 ms. How fast did you need it to be?

like image 39
John Avatar answered Nov 11 '22 22:11


The dqrng package tackles faster sampling in R. Here is one example & benchmark:


m <- 1000
n <- 99999
all <- m * n
bm <- bench::mark(samp = sample(x = c(1, -1), size = all, replace = TRUE),
                  dqsamp = dqsample(x = c(1,-1), size = all, replace = TRUE),
                  check = FALSE, 
                  iterations = 3)

# # A tibble: 2 x 13
#   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
#   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
# 1 samp          6.37s    6.59s     0.153    1.12GB    0.153     3     3     19.56s
# 2 dqsamp        1.07s    1.43s     0.723    1.12GB    0.482     3     2      4.15s
# # ... with 4 more variables: result <list>, memory <list>, time <list>, gc <list>

Here is a related blogpost: https://www.r-bloggers.com/2019/04/fast-sampling-support-in-dqrng/.

like image 2
A.Fischer Avatar answered Nov 11 '22 23:11
