Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I efficiently calculate the binomial cumulative distribution function?

Let's say that I know the probability of a "success" is P. I run the test N times, and I see S successes. The test is akin to tossing an unevenly weighted coin (perhaps heads is a success, tails is a failure).

I want to know the approximate probability of seeing either S successes, or a number of successes less likely than S successes.

So for example, if P is 0.3, N is 100, and I get 20 successes, I'm looking for the probability of getting 20 or fewer successes.

If, on the other hadn, P is 0.3, N is 100, and I get 40 successes, I'm looking for the probability of getting 40 our more successes.

I'm aware that this problem relates to finding the area under a binomial curve, however:

  1. My math-fu is not up to the task of translating this knowledge into efficient code
  2. While I understand a binomial curve would give an exact result, I get the impression that it would be inherently inefficient. A fast method to calculate an approximate result would suffice.

I should stress that this computation has to be fast, and should ideally be determinable with standard 64 or 128 bit floating point computation.

I'm looking for a function that takes P, S, and N - and returns a probability. As I'm more familiar with code than mathematical notation, I'd prefer that any answers employ pseudo-code or code.

like image 750
sanity Avatar asked Jul 08 '09 01:07

sanity


People also ask

How do you find the binomial cumulative distribution function?

y = binocdf( x , n , p ) computes a binomial cumulative distribution function at each of the values in x using the corresponding number of trials in n and the probability of success for each trial in p .

What does cumulative mean in binomial distribution?

The Cumulative Distribution Function – Calculates the probability of there being at most x successes from n independent trials.

How do you find the probability of a binomial distribution being successful?

Binomial probability refers to the probability of exactly x successes on n repeated trials in an experiment which has two possible outcomes (commonly called a binomial experiment). If the probability of success on an individual trial is p , then the binomial probability is nCx⋅px⋅(1−p)n−x .


1 Answers

Exact Binomial Distribution

def factorial(n): 
    if n < 2: return 1
    return reduce(lambda x, y: x*y, xrange(2, int(n)+1))

def prob(s, p, n):
    x = 1.0 - p

    a = n - s
    b = s + 1

    c = a + b - 1

    prob = 0.0

    for j in xrange(a, c + 1):
        prob += factorial(c) / (factorial(j)*factorial(c-j)) \
                * x**j * (1 - x)**(c-j)

    return prob

>>> prob(20, 0.3, 100)
0.016462853241869437

>>> 1-prob(40-1, 0.3, 100)
0.020988576003924564

Normal Estimate, good for large n

import math
def erf(z):
        t = 1.0 / (1.0 + 0.5 * abs(z))
        # use Horner's method
        ans = 1 - t * math.exp( -z*z -  1.26551223 +
                                                t * ( 1.00002368 +
                                                t * ( 0.37409196 + 
                                                t * ( 0.09678418 + 
                                                t * (-0.18628806 + 
                                                t * ( 0.27886807 + 
                                                t * (-1.13520398 + 
                                                t * ( 1.48851587 + 
                                                t * (-0.82215223 + 
                                                t * ( 0.17087277))))))))))
        if z >= 0.0:
                return ans
        else:
                return -ans

def normal_estimate(s, p, n):
    u = n * p
    o = (u * (1-p)) ** 0.5

    return 0.5 * (1 + erf((s-u)/(o*2**0.5)))

>>> normal_estimate(20, 0.3, 100)
0.014548164531920815

>>> 1-normal_estimate(40-1, 0.3, 100)
0.024767304545069813

Poisson Estimate: Good for large n and small p

import math

def poisson(s,p,n):
    L = n*p

    sum = 0
    for i in xrange(0, s+1):
        sum += L**i/factorial(i)

    return sum*math.e**(-L)

>>> poisson(20, 0.3, 100)
0.013411150012837811
>>> 1-poisson(40-1, 0.3, 100)
0.046253037645840323
like image 110
Unknown Avatar answered Oct 30 '22 10:10

Unknown