Given a mean and standard-deviation defining a normal distribution, how would you calculate the following probabilities in pure-Python (i.e. no Numpy/Scipy or other packages not in the standard library)? <ol> <li>The probability of a random variable r where r < x or r <= x.</li> <li>The probability of a random variable r where r > x or r >= x.</li> <li>The probability of a random variable r where x > r > y.</li> </ol> I've found some libraries, like Pgnumerics, that provide functions for calculating these, but the underlying math is unclear to me. Edit: To show this isn't homework, posted below is my working code for Python<=2.6, albeit I'm not sure if it handles the boundary conditions correctly. <pre class="prettyprint"><code>from math import * import unittest def erfcc(x): """ Complementary error function. """ z = abs(x) t = 1. / (1. + 0.5*z) r = t * exp(-z*z-1.26551223+t*(1.00002368+t*(.37409196+ t*(.09678418+t*(-.18628806+t*(.27886807+ t*(-1.13520398+t*(1.48851587+t*(-.82215223+ t*.17087277))))))))) if (x >= 0.): return r else: return 2. - r def normcdf(x, mu, sigma): t = x-mu; y = 0.5*erfcc(-t/(sigma*sqrt(2.0))); if y>1.0: y = 1.0; return y def normpdf(x, mu, sigma): u = (x-mu)/abs(sigma) y = (1/(sqrt(2*pi)*abs(sigma)))*exp(-u*u/2) return y def normdist(x, mu, sigma, f): if f: y = normcdf(x,mu,sigma) else: y = normpdf(x,mu,sigma) return y def normrange(x1, x2, mu, sigma, f=True): """ Calculates probability of random variable falling between two points. """ p1 = normdist(x1, mu, sigma, f) p2 = normdist(x2, mu, sigma, f) return abs(p1-p2) </code></pre>

All these are very similar: If you can compute #1 using a function <code>cdf(x)</code>, then the solution to #2 is simply <code>1 - cdf(x)</code>, and for #3 it's <code>cdf(x) - cdf(y)</code>. Since Python includes the (gauss) error function built in since version 2.7 you can do this by calculating the cdf of the normal distribution using the equation from the article you linked to: <pre class="prettyprint"><code>import math print 0.5 * (1 + math.erf((x - mean)/math.sqrt(2 * standard_dev**2))) </code></pre> where <code>mean</code> is the mean and <code>standard_dev</code> is the standard deviation. Some notes since what you asked seemed relatively straightforward given the information in the article: <ul> <li>CDF of a random variable (say X) is the probability that X lies between -infinity and some limit, say x (lower case). CDF is the integral of the pdf for continuous distributions. The cdf is exactly what you described for #1, you want some normally distributed RV to be between -infinity and x (<= x).</li> <li>< and <= as well as > and >= are same for continuous random variables as the probability that the rv is any single point is 0. So whether or not x itself is included doesn't actually matter when calculating the probabilities for continuous distributions.</li> <li>Sum of probabilities is 1, if its not < x then it's >= x so if you have the <code>cdf(x)</code>. then <code>1 - cdf(x)</code> is the probability that the random variable X >= x. Since >= is equivalent for continuous random variables to >, this is also the probability X > x.</li> </ul>

Calculating Probability of a Random Variable in a Distribution in Python

Q: How do you find the probability distribution of a variable in Python?

Binomial Distribution in Python You can generate a binomial distributed discrete random variable using scipy. stats module's binom. rvs() method which takes $n$ (number of trials) and $p$ (probability of success) as shape parameters. To shift distribution use the loc parameter.

Q: How do you find the probability distribution of a random variable?

The probability distribution for a discrete random variable X can be represented by a formula, a table, or a graph, which provides p(x) = P(X=x) for all x. The probability distribution for a discrete random variable assigns nonzero probabilities to only a countable number of distinct x values.

Q: How do you find the probability of a PDF in Python?

The cdf is simply the integral of the pdf from negative infinity to the value at which it is calculated. Thus to get the integral of the pdf over a range, you simply have to subtract the cdf values at the two end points of the range.

Q: What is probability distribution Python?

What is Python Probability Distribution? A probability distribution is a function under probability theory and statistics- one that gives us how probable different outcomes are in an experiment. It describes events in terms of their probabilities; this is out of all possible outcomes.

Tags:

python

math

probability

probability-theory

Given a mean and standard-deviation defining a normal distribution, how would you calculate the following probabilities in pure-Python (i.e. no Numpy/Scipy or other packages not in the standard library)?

The probability of a random variable r where r < x or r <= x.
The probability of a random variable r where r > x or r >= x.
The probability of a random variable r where x > r > y.

I've found some libraries, like Pgnumerics, that provide functions for calculating these, but the underlying math is unclear to me.

Edit: To show this isn't homework, posted below is my working code for Python<=2.6, albeit I'm not sure if it handles the boundary conditions correctly.

from math import *
import unittest

def erfcc(x):
    """
    Complementary error function.
    """
    z = abs(x)
    t = 1. / (1. + 0.5*z)
    r = t * exp(-z*z-1.26551223+t*(1.00002368+t*(.37409196+
        t*(.09678418+t*(-.18628806+t*(.27886807+
        t*(-1.13520398+t*(1.48851587+t*(-.82215223+
        t*.17087277)))))))))
    if (x >= 0.):
        return r
    else:
        return 2. - r

def normcdf(x, mu, sigma):
    t = x-mu;
    y = 0.5*erfcc(-t/(sigma*sqrt(2.0)));
    if y>1.0:
        y = 1.0;
    return y

def normpdf(x, mu, sigma):
    u = (x-mu)/abs(sigma)
    y = (1/(sqrt(2*pi)*abs(sigma)))*exp(-u*u/2)
    return y

def normdist(x, mu, sigma, f):
    if f:
        y = normcdf(x,mu,sigma)
    else:
        y = normpdf(x,mu,sigma)
    return y

def normrange(x1, x2, mu, sigma, f=True):
    """
    Calculates probability of random variable falling between two points.
    """
    p1 = normdist(x1, mu, sigma, f)
    p2 = normdist(x2, mu, sigma, f)
    return abs(p1-p2)

623

asked Feb 25 '12 21:02

Cerin

1 Answers

All these are very similar: If you can compute #1 using a function cdf(x), then the solution to #2 is simply 1 - cdf(x), and for #3 it's cdf(x) - cdf(y).

Since Python includes the (gauss) error function built in since version 2.7 you can do this by calculating the cdf of the normal distribution using the equation from the article you linked to:

import math
print 0.5 * (1 + math.erf((x - mean)/math.sqrt(2 * standard_dev**2)))

where mean is the mean and standard_dev is the standard deviation.

Some notes since what you asked seemed relatively straightforward given the information in the article:

CDF of a random variable (say X) is the probability that X lies between -infinity and some limit, say x (lower case). CDF is the integral of the pdf for continuous distributions. The cdf is exactly what you described for #1, you want some normally distributed RV to be between -infinity and x (<= x).
< and <= as well as > and >= are same for continuous random variables as the probability that the rv is any single point is 0. So whether or not x itself is included doesn't actually matter when calculating the probabilities for continuous distributions.
Sum of probabilities is 1, if its not < x then it's >= x so if you have the cdf(x). then 1 - cdf(x) is the probability that the random variable X >= x. Since >= is equivalent for continuous random variables to >, this is also the probability X > x.

127

answered Sep 29 '22 10:09

ameer

Related questions
                            
                                Python Object Inspector GUI
                            
                                how to show a message from a blender script?
                            
                                How to create a numpy record array?
                            
                                Python numpy compute first eigenvalue and eigenvector
                            
                                QSettings(): How to save to current working directory
                            
                                How to write to log in python with nginx + uwsgi
                            
                                transpose/rotate a block of a matrix in python
                            
                                Parsing .rst files with Sphinx-specific directives programmatically
                            
                                How to check whether a file is_open and the open_status in python
                            
                                Python: Extract variables out of namespace
                            
                                How to extract info from scikits.learn classifier to then use in C code
                            
                                Stripping payload from a tcpdump?
                            
                                PyInstaller error with PyQt when trying to build --onefile
                            
                                Reverse Inlines in Django Admin with more than one model
                            
                                Dense Cholesky update in Python
                            
                                How to programmatically determine default applications in linux
                            
                                Flask giving an internal server error instead of rendering 404
                            
                                RPython sys methods don't work
                            
                                Getting the first item item in a many-to-many relation in Django
                            
                                Console windows closes right after I hit ctrl+F5 in visual studio tools for python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With