Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Computing a binomial probability for huge numbers

I want to compute binomial probabilities on python. I tried to apply the formula:

probability = scipy.misc.comb(n,k)*(p**k)*((1-p)**(n-k))

Some of the probabilities I get are infinite. I checked some values for which p=inf. For one of them, n=450,000 and k=17. This value must be greater than 1e302 which is the maximum value handled by floats.

I then tried to use sum(np.random.binomial(n,p,numberOfTrials)==valueOfInterest)/numberOfTrials

This draws numberOfTrials samples and computes the average number of times the value valueOfInterest is drawn.

This doesn't raise any infinite value. However, is this a valid way to proceed? And why this way wouldn't raise any infinite value whereas computing the probabilities does?

like image 213
bigTree Avatar asked Mar 05 '14 15:03

bigTree


People also ask

Can you use binomial distribution when n is large?

However, for N much larger than n, the binomial distribution remains a good approximation, and is widely used.

What happens to binomial distribution when n is large?

For large values of n, the distributions of the count X and the sample proportion are approximately normal. This result follows from the Central Limit Theorem. The mean and variance for the approximately normal distribution of X are np and np(1-p), identical to the mean and variance of the binomial(n,p) distribution.

How do you calculate binomial probability?

The binomial distribution is calculated by multiplying the probability of success raised to the power of the number of successes and the probability of failure raised to the power of the difference between the number of successes and the number of trials.

How does sample size affect binomial distribution?

The n observations will be nearly independent when the size of the population is much larger than the size of the sample. As a rule of thumb, the binomial sampling distribution for counts can be used when the population is at least 20 times as large as the sample.


1 Answers

Because you're using scipy I thought I would mention that scipy already has statistical distributions implemented. Also note that when n is this large the binomial distribution is well approximated by the normal distribution (or Poisson if p is very small).

n = 450000
p = .5
k = np.array([17., 225000, 226000])

b = scipy.stats.binom(n, p)
print b.pmf(k)
# array([  0.00000000e+00,   1.18941527e-03,   1.39679862e-05])
n = scipy.stats.norm(n*p, np.sqrt(n*p*(1-p)))
print n.pdf(k)
# array([  0.00000000e+00,   1.18941608e-03,   1.39680605e-05])

print b.pmf(k) - n.pdf(k)
# array([  0.00000000e+00,  -8.10313274e-10,  -7.43085142e-11])
like image 125
Bi Rico Avatar answered Sep 20 '22 15:09

Bi Rico