Consider the following sets of probabilities (the three events are NOT mutually exclusive):
How do I compute the probability of at least one event happening (i.e. the union)?
If possible I would prefer a generic, self-contained solution that could also handle 4 or more events. In this case the answer I'm looking for is:
0.05625 + 0.05625 + 0.05625 -
0.05625*0.05625 - 0.05625*0.05625 - 0.05625*0.05625 +
0.05625*0.05625*0.05625
##[1] 0.1594358
My question is ultimately a bit broader than the title, as I'm looking for functions that can compute the probability of union, intersection (0.05625*0.05625*0.05625 = 0.0001779785
), no event happening (1 - 0.1594358 = 0.8405642
) or exactly one event happening (0.150300
). In other words an R solution to this on-line Conjunction of three events calculator. I've already looked into the prob
package, but it seems to have an interface too complicated for such a simplistic use-case.
You can get the probability of exactly 0, 1, 2, or 3 of these occurring with the binomial density function dbinom
, which returns the probability of getting exactly the number of specified successes (first argument) given the total number of independent attempts (second argument) and the probability of success for each attempt (third argument):
dbinom(0:3, 3, 0.05625)
# [1] 0.8405642090 0.1502995605 0.0089582520 0.0001779785
So if you want the probability of at least one happening, that is:
sum(dbinom(1:3, 3, 0.05625))
# [1] 0.1594358
or
1 - dbinom(0, 3, 0.05625)
# [1] 0.1594358
The dbinom
function can address your other questions as well. For instance, the probability of all happening is:
dbinom(3, 3, 0.05625)
# [1] 0.0001779785
The probability of exactly one is:
dbinom(1, 3, 0.05625)
# [1] 0.1502996
The probability of none is:
dbinom(0, 3, 0.05625)
# [1] 0.8405642
If you have unequal probabilities stored in a vector p
and each item is selected independently, you need to do a bit more work, because the dbinom
function doesn't apply. Still, some of the computations are pretty simple.
The probability of none of the items being selected is simply the product of 1 minus the probabilities (the probability that at least one is selected is simply one minus this):
p <- c(0.1, 0.2, 0.3)
prod(1-p)
# [1] 0.504
The probability of all is the product of the probabilities:
prod(p)
# [1] 0.006
Finally, the probability of exactly one being selected is the sum across all the elements of its probability times the probability of all the other elements not being selected:
sum(p * (prod(1-p) / (1-p)))
# [1] 0.398
Similarly, the probability of exactly n-1
being selected (where n
is the number of probabilities) is:
sum((1-p) * (prod(p) / p))
# [1] 0.092
If you want the probability of every single possible count of successes, one option could be computing all 2^n
event combinations (which is what A. Webb does in their answer). Instead, the following is a O(n^2) scheme:
cp.quadratic <- function(p) {
P <- matrix(0, nrow=length(p), ncol=length(p))
P[1,] <- rev(cumsum(rev(p * prod(1-p) / (1-p))))
for (i in seq(2, length(p))) {
P[i,] <- c(rev(cumsum(rev(head(p, -1) / (1-head(p, -1)) * tail(P[i-1,], -1)))), 0)
}
c(prod(1-p), P[,1])
}
cp.quadratic(c(0.1, 0.2, 0.3))
# [1] 0.504 0.398 0.092 0.006
Basically, we define P_ij to be the probability that we have exactly i
successes, all of which are in position j
or greater. Base cases for i=0
and i=1
are relatively simple to compute, and then we have the following recurrence:
P_ij = P_i(j+1) + p_j / (1-p_j) * P_(i-1)(j+1)
In the function cp.quadratic
, we loop with increasing i
, filling out the P
matrix (which is n
x n
). Therefore the total operation count is O(n^2).
This enables you, for instance, to compute the distribution for pretty large numbers of options in under a second:
system.time(cp.quadratic(sample(c(.1, .2, .3), 100, replace=T)))
# user system elapsed
# 0.005 0.000 0.006
system.time(cp.quadratic(sample(c(.1, .2, .3), 1000, replace=T)))
# user system elapsed
# 0.165 0.043 0.208
system.time(cp.quadratic(sample(c(.1, .2, .3), 10000, replace=T)))
# user system elapsed
# 12.721 3.161 16.567
We can compute the distribution from 1,000 elements in a fraction of a second and from 10,000 elements in under a minute; computing 2^1000 or 2^10000 possible outcomes would take a prohibitively long time (the number of subsets are a 301-digit and 3010-digit number, respectively).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With