How can I efficiently calculate the binomial cumulative distribution function?

Tags:

Let's say that I know the probability of a "success" is P. I run the test N times, and I see S successes. The test is akin to tossing an unevenly weighted coin (perhaps heads is a success, tails is a failure).

I want to know the approximate probability of seeing either S successes, or a number of successes less likely than S successes.

So for example, if P is 0.3, N is 100, and I get 20 successes, I'm looking for the probability of getting 20 or fewer successes.

If, on the other hadn, P is 0.3, N is 100, and I get 40 successes, I'm looking for the probability of getting 40 our more successes.

I'm aware that this problem relates to finding the area under a binomial curve, however:

My math-fu is not up to the task of translating this knowledge into efficient code
While I understand a binomial curve would give an exact result, I get the impression that it would be inherently inefficient. A fast method to calculate an approximate result would suffice.

I should stress that this computation has to be fast, and should ideally be determinable with standard 64 or 128 bit floating point computation.

I'm looking for a function that takes P, S, and N - and returns a probability. As I'm more familiar with code than mathematical notation, I'd prefer that any answers employ pseudo-code or code.

750

asked Jul 08 '09 01:07

sanity

1 Answers

Exact Binomial Distribution

def factorial(n): 
    if n < 2: return 1
    return reduce(lambda x, y: x*y, xrange(2, int(n)+1))

def prob(s, p, n):
    x = 1.0 - p

    a = n - s
    b = s + 1

    c = a + b - 1

    prob = 0.0

    for j in xrange(a, c + 1):
        prob += factorial(c) / (factorial(j)*factorial(c-j)) \
                * x**j * (1 - x)**(c-j)

    return prob

>>> prob(20, 0.3, 100)
0.016462853241869437

>>> 1-prob(40-1, 0.3, 100)
0.020988576003924564

Normal Estimate, good for large n

import math
def erf(z):
        t = 1.0 / (1.0 + 0.5 * abs(z))
        # use Horner's method
        ans = 1 - t * math.exp( -z*z -  1.26551223 +
                                                t * ( 1.00002368 +
                                                t * ( 0.37409196 + 
                                                t * ( 0.09678418 + 
                                                t * (-0.18628806 + 
                                                t * ( 0.27886807 + 
                                                t * (-1.13520398 + 
                                                t * ( 1.48851587 + 
                                                t * (-0.82215223 + 
                                                t * ( 0.17087277))))))))))
        if z >= 0.0:
                return ans
        else:
                return -ans

def normal_estimate(s, p, n):
    u = n * p
    o = (u * (1-p)) ** 0.5

    return 0.5 * (1 + erf((s-u)/(o*2**0.5)))

>>> normal_estimate(20, 0.3, 100)
0.014548164531920815

>>> 1-normal_estimate(40-1, 0.3, 100)
0.024767304545069813

Poisson Estimate: Good for large n and small p

import math

def poisson(s,p,n):
    L = n*p

    sum = 0
    for i in xrange(0, s+1):
        sum += L**i/factorial(i)

    return sum*math.e**(-L)

>>> poisson(20, 0.3, 100)
0.013411150012837811
>>> 1-poisson(40-1, 0.3, 100)
0.046253037645840323

110

answered Oct 30 '22 10:10

Unknown

Related questions
                            
                                Is Quicksort a potential security risk?
                            
                                Cut rectangle in minimum number of squares
                            
                                Algorithm for efficient diffing of huge files
                            
                                evaluate whether a number is integer power of 4
                            
                                Why does the order of loops in a matrix multiply algorithm affect performance? [duplicate]
                            
                                Algorithm for minimum manhattan distance
                            
                                Time complexity of N Queen using backtracking?
                            
                                Find smallest number in Sorted Rotatable Array
                            
                                Multi-threaded algorithm for solving sudoku?
                            
                                Fractal Encryption
                            
                                Creating a power set of a Sequence
                            
                                How to sort an array using minimum number of writes?
                            
                                Algorithm / pseudo-code to create paging links?
                            
                                How do you use a Bidirectional BFS to find the shortest path?
                            
                                Generate Random numbers without using any external functions
                            
                                Finding the closest fibonacci numbers
                            
                                Node search in Binary Tree overflows stack
                            
                                How to find the closest element to a given key value in a binary search tree?
                            
                                Print all the permutations of a string in C
                            
                                Optimized TSP Algorithms

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I efficiently calculate the binomial cumulative distribution function?

Tags:

algorithm

math

probability

binomial-cdf

sanity

People also ask

1 Answers

Unknown

Recent Activity

Donate For Us