Given a vector p of probabilities, what is a fast way to generate a random boolean vector x of same length as p and with independent elements such that x[i]==TRUE with probability p[i] for every i?
Specifically, is there a faster way than these?:
p <- rep(0.5,10e6)
system.time(runif(length(p)) < p)
   user  system elapsed 
   0.36    0.02    0.37 
system.time(rbinom(length(p),1,p)>0)
   user  system elapsed 
   1.14    0.04    1.17 
                Using the sample function is faster on my machine:
system.time(runif(length(p)) < p)
    user  system elapsed 
    0.315   0.002   0.318 
system.time(sample(c(TRUE,FALSE), 10e6, TRUE))
    user  system elapsed 
    0.2     0.0     0.2 
                        p <- rep(0.5,10e6)
microbenchmark(runif(length(p)) < p,
               sample.int(n=10,size=length(p),replace=TRUE) < p*10+1,
               times=10)
Gives the following results
expr                   min       lq     mean   median       uq      max neval
runif(length(p)) < p   465.7474 467.6487 477.6264 469.5444 477.7114 541.8130    10
sample.int(n = 10,     266.1194 268.7164 311.0995 307.0160 333.6954 418.2309    10
For low probabilities p you might want change to larger integers than 10 for the sample function and p*10+1.
Let's check whether both functions give the same results
set.seed(1234)
p=c(0.1,0.5)
sample_matrix=matrix(NA_real_,nrow=1e6,ncol=length(p))
for (i in 1:nrow(sample_matrix)) sample_matrix[i,]=(runif(length(p)) < p)
colSums(sample_matrix)/nrow(sample_matrix)
#[1] 0.100026 0.500340
for (i in 1:nrow(sample_matrix)) sample_matrix[i,]=(sample.int(n=10,size=length(p),replace=TRUE) < p*10+1)
colSums(sample_matrix)/nrow(sample_matrix)
#[1] 0.100535 0.499451
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With