Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to generate random boolean vector

Tags:

random

r

Given a vector p of probabilities, what is a fast way to generate a random boolean vector x of same length as p and with independent elements such that x[i]==TRUE with probability p[i] for every i?

Specifically, is there a faster way than these?:

p <- rep(0.5,10e6)

system.time(runif(length(p)) < p)
   user  system elapsed 
   0.36    0.02    0.37 

system.time(rbinom(length(p),1,p)>0)
   user  system elapsed 
   1.14    0.04    1.17 
like image 890
Museful Avatar asked Oct 01 '15 11:10

Museful


2 Answers

Using the sample function is faster on my machine:

system.time(runif(length(p)) < p)
    user  system elapsed 
    0.315   0.002   0.318 
system.time(sample(c(TRUE,FALSE), 10e6, TRUE))
    user  system elapsed 
    0.2     0.0     0.2 
like image 181
Edwin Avatar answered Nov 18 '22 19:11

Edwin


p <- rep(0.5,10e6)
microbenchmark(runif(length(p)) < p,
               sample.int(n=10,size=length(p),replace=TRUE) < p*10+1,
               times=10)

Gives the following results

expr                   min       lq     mean   median       uq      max neval
runif(length(p)) < p   465.7474 467.6487 477.6264 469.5444 477.7114 541.8130    10
sample.int(n = 10,     266.1194 268.7164 311.0995 307.0160 333.6954 418.2309    10

For low probabilities p you might want change to larger integers than 10 for the sample function and p*10+1.

Let's check whether both functions give the same results

set.seed(1234)
p=c(0.1,0.5)
sample_matrix=matrix(NA_real_,nrow=1e6,ncol=length(p))

for (i in 1:nrow(sample_matrix)) sample_matrix[i,]=(runif(length(p)) < p)
colSums(sample_matrix)/nrow(sample_matrix)
#[1] 0.100026 0.500340
for (i in 1:nrow(sample_matrix)) sample_matrix[i,]=(sample.int(n=10,size=length(p),replace=TRUE) < p*10+1)
colSums(sample_matrix)/nrow(sample_matrix)
#[1] 0.100535 0.499451
like image 32
cryo111 Avatar answered Nov 18 '22 20:11

cryo111