Given a mean and standard-deviation defining a normal distribution, how would you calculate the following probabilities in pure-Python (i.e. no Numpy/Scipy or other packages not in the standard library)?
I've found some libraries, like Pgnumerics, that provide functions for calculating these, but the underlying math is unclear to me.
Edit: To show this isn't homework, posted below is my working code for Python<=2.6, albeit I'm not sure if it handles the boundary conditions correctly.
from math import *
import unittest
def erfcc(x):
"""
Complementary error function.
"""
z = abs(x)
t = 1. / (1. + 0.5*z)
r = t * exp(-z*z-1.26551223+t*(1.00002368+t*(.37409196+
t*(.09678418+t*(-.18628806+t*(.27886807+
t*(-1.13520398+t*(1.48851587+t*(-.82215223+
t*.17087277)))))))))
if (x >= 0.):
return r
else:
return 2. - r
def normcdf(x, mu, sigma):
t = x-mu;
y = 0.5*erfcc(-t/(sigma*sqrt(2.0)));
if y>1.0:
y = 1.0;
return y
def normpdf(x, mu, sigma):
u = (x-mu)/abs(sigma)
y = (1/(sqrt(2*pi)*abs(sigma)))*exp(-u*u/2)
return y
def normdist(x, mu, sigma, f):
if f:
y = normcdf(x,mu,sigma)
else:
y = normpdf(x,mu,sigma)
return y
def normrange(x1, x2, mu, sigma, f=True):
"""
Calculates probability of random variable falling between two points.
"""
p1 = normdist(x1, mu, sigma, f)
p2 = normdist(x2, mu, sigma, f)
return abs(p1-p2)
Binomial Distribution in Python You can generate a binomial distributed discrete random variable using scipy. stats module's binom. rvs() method which takes $n$ (number of trials) and $p$ (probability of success) as shape parameters. To shift distribution use the loc parameter.
The probability distribution for a discrete random variable X can be represented by a formula, a table, or a graph, which provides p(x) = P(X=x) for all x. The probability distribution for a discrete random variable assigns nonzero probabilities to only a countable number of distinct x values.
The cdf is simply the integral of the pdf from negative infinity to the value at which it is calculated. Thus to get the integral of the pdf over a range, you simply have to subtract the cdf values at the two end points of the range.
What is Python Probability Distribution? A probability distribution is a function under probability theory and statistics- one that gives us how probable different outcomes are in an experiment. It describes events in terms of their probabilities; this is out of all possible outcomes.
All these are very similar: If you can compute #1 using a function cdf(x)
, then the solution to #2 is simply 1 - cdf(x)
, and for #3 it's cdf(x) - cdf(y)
.
Since Python includes the (gauss) error function built in since version 2.7 you can do this by calculating the cdf of the normal distribution using the equation from the article you linked to:
import math
print 0.5 * (1 + math.erf((x - mean)/math.sqrt(2 * standard_dev**2)))
where mean
is the mean and standard_dev
is the standard deviation.
Some notes since what you asked seemed relatively straightforward given the information in the article:
cdf(x)
. then 1 - cdf(x)
is the probability that the random variable X >= x. Since >= is equivalent for continuous random variables to >, this is also the probability X > x.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With