Given a vector p
of probabilities, what is a fast way to generate a random boolean vector x
of same length as p
and with independent elements such that x[i]==TRUE
with probability p[i]
for every i
?
Specifically, is there a faster way than these?:
p <- rep(0.5,10e6)
system.time(runif(length(p)) < p)
user system elapsed
0.36 0.02 0.37
system.time(rbinom(length(p),1,p)>0)
user system elapsed
1.14 0.04 1.17
Using the sample
function is faster on my machine:
system.time(runif(length(p)) < p)
user system elapsed
0.315 0.002 0.318
system.time(sample(c(TRUE,FALSE), 10e6, TRUE))
user system elapsed
0.2 0.0 0.2
p <- rep(0.5,10e6)
microbenchmark(runif(length(p)) < p,
sample.int(n=10,size=length(p),replace=TRUE) < p*10+1,
times=10)
Gives the following results
expr min lq mean median uq max neval
runif(length(p)) < p 465.7474 467.6487 477.6264 469.5444 477.7114 541.8130 10
sample.int(n = 10, 266.1194 268.7164 311.0995 307.0160 333.6954 418.2309 10
For low probabilities p
you might want change to larger integers than 10
for the sample
function and p*10+1
.
Let's check whether both functions give the same results
set.seed(1234)
p=c(0.1,0.5)
sample_matrix=matrix(NA_real_,nrow=1e6,ncol=length(p))
for (i in 1:nrow(sample_matrix)) sample_matrix[i,]=(runif(length(p)) < p)
colSums(sample_matrix)/nrow(sample_matrix)
#[1] 0.100026 0.500340
for (i in 1:nrow(sample_matrix)) sample_matrix[i,]=(sample.int(n=10,size=length(p),replace=TRUE) < p*10+1)
colSums(sample_matrix)/nrow(sample_matrix)
#[1] 0.100535 0.499451
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With