Best way to write a Python function that integrates a gaussian?

Tags:

In attempting to use scipy's quad method to integrate a gaussian (lets say there's a gaussian method named gauss), I was having problems passing needed parameters to gauss and leaving quad to do the integration over the correct variable. Does anyone have a good example of how to use quad w/ a multidimensional function?

But this led me to a more grand question about the best way to integrate a gaussian in general. I didn't find a gaussian integrate in scipy (to my surprise). My plan was to write a simple gaussian function and pass it to quad (or maybe now a fixed width integrator). What would you do?

Edit: Fixed-width meaning something like trapz that uses a fixed dx to calculate areas under a curve.

What I've come to so far is a method make___gauss that returns a lambda function that can then go into quad. This way I can make a normal function with the average and variance I need before integrating.

def make_gauss(N, sigma, mu):
    return (lambda x: N/(sigma * (2*numpy.pi)**.5) *
            numpy.e ** (-(x-mu)**2/(2 * sigma**2)))

quad(make_gauss(N=10, sigma=2, mu=0), -inf, inf)

When I tried passing a general gaussian function (that needs to be called with x, N, mu, and sigma) and filling in some of the values using quad like

quad(gen_gauss, -inf, inf, (10,2,0))

the parameters 10, 2, and 0 did NOT necessarily match N=10, sigma=2, mu=0, which prompted the more extended definition.

The erf(z) in scipy.special would require me to define exactly what t is initially, but it nice to know it is there.

501

asked Feb 04 '09 03:02

physicsmichael

2 Answers

Okay, you appear to be pretty confused about several things. Let's start at the beginning: you mentioned a "multidimensional function", but then go on to discuss the usual one-variable Gaussian curve. This is not a multidimensional function: when you integrate it, you only integrate one variable (x). The distinction is important to make, because there is a monster called a "multivariate Gaussian distribution" which is a true multidimensional function and, if integrated, requires integrating over two or more variables (which uses the expensive Monte Carlo technique I mentioned before). But you seem to just be talking about the regular one-variable Gaussian, which is much easier to work with, integrate, and all that.

The one-variable Gaussian distribution has two parameters, sigma and mu, and is a function of a single variable we'll denote x. You also appear to be carrying around a normalization parameter n (which is useful in a couple of applications). Normalization parameters are usually not included in calculations, since you can just tack them back on at the end (remember, integration is a linear operator: int(n*f(x), x) = n*int(f(x), x) ). But we can carry it around if you like; the notation I like for a normal distribution is then

N(x | mu, sigma, n) := (n/(sigma*sqrt(2*pi))) * exp((-(x-mu)^2)/(2*sigma^2))

(read that as "the normal distribution of x given sigma, mu, and n is given by...") So far, so good; this matches the function you've got. Notice that the only true variable here is x: the other three parameters are fixed for any particular Gaussian.

Now for a mathematical fact: it is provably true that all Gaussian curves have the same shape, they're just shifted around a little bit. So we can work with N(x|0,1,1), called the "standard normal distribution", and just translate our results back to the general Gaussian curve. So if you have the integral of N(x|0,1,1), you can trivially calculate the integral of any Gaussian. This integral appears so frequently that it has a special name: the error function erf. Because of some old conventions, it's not exactly erf; there are a couple additive and multiplicative factors also being carried around.

If Phi(z) = integral(N(x|0,1,1), -inf, z); that is, Phi(z) is the integral of the standard normal distribution from minus infinity up to z, then it's true by the definition of the error function that

Phi(z) = 0.5 + 0.5 * erf(z / sqrt(2)).

Likewise, if Phi(z | mu, sigma, n) = integral( N(x|sigma, mu, n), -inf, z); that is, Phi(z | mu, sigma, n) is the integral of the normal distribution given parameters mu, sigma, and n from minus infinity up to z, then it's true by the definition of the error function that

Phi(z | mu, sigma, n) = (n/2) * (1 + erf((x - mu) / (sigma * sqrt(2)))).

Take a look at the Wikipedia article on the normal CDF if you want more detail or a proof of this fact.

Okay, that should be enough background explanation. Back to your (edited) post. You say "The erf(z) in scipy.special would require me to define exactly what t is initially". I have no idea what you mean by this; where does t (time?) enter into this at all? Hopefully the explanation above has demystified the error function a bit and it's clearer now as to why the error function is the right function for the job.

Your Python code is OK, but I would prefer a closure over a lambda:

def make_gauss(N, sigma, mu):
    k = N / (sigma * math.sqrt(2*math.pi))
    s = -1.0 / (2 * sigma * sigma)
    def f(x):
        return k * math.exp(s * (x - mu)*(x - mu))
    return f

Using a closure enables precomputation of constants k and s, so the returned function will need to do less work each time it's called (which can be important if you're integrating it, which means it'll be called many times). Also, I have avoided any use of the exponentiation operator **, which is slower than just writing the squaring out, and hoisted the divide out of the inner loop and replaced it with a multiply. I haven't looked at all at their implementation in Python, but from my last time tuning an inner loop for pure speed using raw x87 assembly, I seem to remember that adds, subtracts, or multiplies take about 4 CPU cycles each, divides about 36, and exponentiation about 200. That was a couple years ago, so take those numbers with a grain of salt; still, it illustrates their relative complexity. As well, calculating exp(x) the brute-force way is a very bad idea; there are tricks you can take when writing a good implementation of exp(x) that make it significantly faster and more accurate than a general a**b style exponentiation.

I've never used the numpy version of the constants pi and e; I've always stuck with the plain old math module's versions. I don't know why you might prefer either one.

I'm not sure what you're going for with the quad() call. quad(gen_gauss, -inf, inf, (10,2,0)) ought to integrate a renormalized Gaussian from minus infinity to plus infinity, and should always spit out 10 (your normalization factor), since the Gaussian integrates to 1 over the real line. Any answer far from 10 (I wouldn't expect exactly 10 since quad() is only an approximation, after all) means something is screwed up somewhere... hard to say what's screwed up without knowing the actual return value and possibly the inner workings of quad().

Hopefully that has demystified some of the confusion, and explained why the error function is the right answer to your problem, as well as how to do it all yourself if you're curious. If any of my explanation wasn't clear, I suggest taking a quick look at Wikipedia first; if you still have questions, don't hesitate to ask.

answered Oct 15 '22 06:10

kquinn

scipy ships with the "error function", aka Gaussian integral:

import scipy.special
help(scipy.special.erf)

answered Oct 15 '22 07:10

Mr Fooz

Related questions
                            
                                Troubles installing mysqlclient with pip3
                            
                                Error while opening port in Python using TI Chronos
                            
                                virtualenv command not found after installed with MacPorts
                            
                                How can I use a 'for' loop for just one variable in a function which depends on two variables?
                            
                                Is there a better way to convert a list to a dictionary in Python with keys but no values?
                            
                                PyQt4 signals and slots
                            
                                Syntax error iterating over tuple in python
                            
                                Save unicode in redis but fetch error
                            
                                Django/MySQL-python - Connection using old (pre-4.1.1) authentication protocol refused (client option 'secure_auth' enabled)
                            
                                Python HTML removal
                            
                                Plane fitting to 4 (or more) XYZ points
                            
                                Python, numpy sort array
                            
                                Choose largest odd number python
                            
                                Appending two arrays together in Python
                            
                                What is a better way to define a class' __init__ method with optional keyword arguments?
                            
                                a StringIO like class, that extends django.core.files.File
                            
                                Counting Letter Frequency in a String (Python) [duplicate]
                            
                                Which of these scripting languages is more appropriate for pen-testing? [closed]
                            
                                Would one have to know the machine architecture to write code?
                            
                                Scrapy installed, but won't run from the command line

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Best way to write a Python function that integrates a gaussian?

Tags:

python

scipy

gaussian

integral

physicsmichael

People also ask

2 Answers

kquinn

Mr Fooz

Recent Activity

Donate For Us