Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What specific requirements does the function passed to scipy.optimize.curve_fit need to fulfill in order to run?

I am working on fitting statistical models to distributions using matplotlib's hist function. For example my code fits an exponential distribution using the following code:

    try:

        def expDist(x, a, x0):
            return a*(exp(-(x/x0))/x0)

        self.n, self.bins, patches = plt.hist(self.getDataSet(), self.getDatasetSize()/10, normed=1, facecolor='blue', alpha = 0.55)
        popt,pcov = curve_fit(expDist,self.bins[:-1], self.n, p0=[1,mean])
        print "Fitted gaussian curve to data with params a %f, x0 %f" % (popt[0], popt[1])
        self.a = popt[0]
        self.x0 = popt[1]

        self.fitted = True
    except RuntimeError:
        print "Unable to fit data to exponential curve"

Which runs fine, but when I modify it to do the same thing for a uniform distribution between a & b,

    def uniDist(x, a, b):
        if((x >= a)and(x <= b)):
            return float(1.0/float(b-a))
        else:
            return 0.000

    try:



        self.n, self.bins, patches = plt.hist(self.getDataSet(), self.getDatasetSize()/10, normed=1, facecolor='blue', alpha = 0.55)
        popt,pcov = curve_fit(uniDist,self.bins[:-1], self.n, p0=[a, b])
        print "Fitted uniform distribution curve to data with params a %f, b %f" % (popt[0], popt[1])
        self.a = popt[0]
        self.b = popt[1]

        self.fitted = True
    except RuntimeError:
        print "Unable to fit data to uniform distribution pdf curve"

The code crashes, with

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

The issue seems to be that somewhere in curve_fit, the function is trying to call the function to be fitted (expDist, and uniDist in this case) with an iterable set of values, but I can't figure out how the expDist function is able to take anything iterable without crashing?

like image 222
BruceJohnJennerLawso Avatar asked Oct 29 '22 13:10

BruceJohnJennerLawso


1 Answers

Your suspicion is partly correct. curve_fit indeed passes an iterable to the function, but not just any iterable: a numpy.ndarray. These happen to possess vectorized arithmetic operators, so

a*(exp(-(x/x0))/x0)

will simply work element-wise over the input arrays without any errors (and with correct output). There isn't even much magic involved: for each evaluation of the function, the parameters a and x0 will be scalars, only x is an array.

Now, the problem with uniDist is that it doesn't only contain arithmetic operators: it contains comparison operators as well. These work fine as long as only a single array is compared to a scalar:

>>> import numpy as np
>>> a = np.arange(5)
>>> a
array([0, 1, 2, 3, 4])
>>> a>2
array([False, False, False,  True,  True], dtype=bool)

The above demonstrates that using comparison operators on an array and a scalar will again produce element-wise results. The error you see arises when you try to apply a logical operator to two of these boolean arrays:

>>> a>2
array([False, False, False,  True,  True], dtype=bool)
>>> a<4
array([ True,  True,  True,  True, False], dtype=bool)
>>> (a>2) and (a<4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

The error message is a bit confusing. It can be traced back to the fact that python would try to come up with a single result for array1 and array2 (which in native python would return either array based on their emptiness). However, numpy suspects that this is not what you want to do, and resists the temptation to guess.

Since you want your function to operate element-wise on the two boolean arrays (which come from the comparison operation), you'll have to use the & operator. This is "binary and" in native python, but for numpy arrays this gives you the element-wise "logical and" of the arrays. You could also use numpy.logical_and (or in your case scipy.logical_and) to be more explicit:

>>> (a>2) & (a<4)
array([False, False, False,  True, False], dtype=bool)
>>> np.logical_and(a>2,a<4)
array([False, False, False,  True, False], dtype=bool)

Note that for the & case you always have to parenthesize your comparisons, since again a>2&a<4 would be ambiguous (to the programmer) and wrong (considering what you want it to do). Since the "binary and" of booleans will behave exactly as you'd expect it, it is safe to rewrite your function to use & instead of and for comparing the two comparisons.

However, there is still one step you'll need to change: in case of ndarray inputs, the if will also behave differently. Python can't help but make a single choice in an if, which is also true if you put an array into it. But what you actually want to do is to constrain the elements of your output element-wise (again). So you either have to loop over your array (don't), or do this choice again in a vectorized way. The latter is idiomatic using numpy/scipy:

import scipy as sp
def uniDist(x, a, b):
    return sp.where((a<=x) & (x<=b), 1.0/(b-a), 0.0)

This (namely numpy.where) will return an array the same size as x. For the elements where the condition is True, the value of the output will be 1/(b-a). For the rest the output is 0. For scalar x, the return value is a numpy scalar. Note that I removed the float conversion in the above example, since having 1.0 in the numerator will definitely give you true division, despite your using python 2. Although I would suggest using python 3, or at least from __future__ import division.


Minor note: even for a scalar case I would suggest making use of python's operator chaining for the comparison, which lends itself to this purpose. What I mean is that you can simply do if a <= x <= b: ..., and unlike most languages, this will be functionally equivalent to what you wrote (but prettier).

like image 195