I am working on fitting statistical models to distributions using matplotlib's hist
function. For example my code fits an exponential distribution using the following code:
try:
def expDist(x, a, x0):
return a*(exp(-(x/x0))/x0)
self.n, self.bins, patches = plt.hist(self.getDataSet(), self.getDatasetSize()/10, normed=1, facecolor='blue', alpha = 0.55)
popt,pcov = curve_fit(expDist,self.bins[:-1], self.n, p0=[1,mean])
print "Fitted gaussian curve to data with params a %f, x0 %f" % (popt[0], popt[1])
self.a = popt[0]
self.x0 = popt[1]
self.fitted = True
except RuntimeError:
print "Unable to fit data to exponential curve"
Which runs fine, but when I modify it to do the same thing for a uniform distribution between a
& b
,
def uniDist(x, a, b):
if((x >= a)and(x <= b)):
return float(1.0/float(b-a))
else:
return 0.000
try:
self.n, self.bins, patches = plt.hist(self.getDataSet(), self.getDatasetSize()/10, normed=1, facecolor='blue', alpha = 0.55)
popt,pcov = curve_fit(uniDist,self.bins[:-1], self.n, p0=[a, b])
print "Fitted uniform distribution curve to data with params a %f, b %f" % (popt[0], popt[1])
self.a = popt[0]
self.b = popt[1]
self.fitted = True
except RuntimeError:
print "Unable to fit data to uniform distribution pdf curve"
The code crashes, with
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
The issue seems to be that somewhere in curve_fit
, the function is trying to call the function to be fitted (expDist
, and uniDist
in this case) with an iterable set of values, but I can't figure out how the expDist
function is able to take anything iterable without crashing?
Your suspicion is partly correct. curve_fit
indeed passes an iterable to the function, but not just any iterable: a numpy.ndarray
. These happen to possess vectorized arithmetic operators, so
a*(exp(-(x/x0))/x0)
will simply work element-wise over the input arrays without any errors (and with correct output). There isn't even much magic involved: for each evaluation of the function, the parameters a
and x0
will be scalars, only x
is an array.
Now, the problem with uniDist
is that it doesn't only contain arithmetic operators: it contains comparison operators as well. These work fine as long as only a single array is compared to a scalar:
>>> import numpy as np
>>> a = np.arange(5)
>>> a
array([0, 1, 2, 3, 4])
>>> a>2
array([False, False, False, True, True], dtype=bool)
The above demonstrates that using comparison operators on an array and a scalar will again produce element-wise results. The error you see arises when you try to apply a logical operator to two of these boolean arrays:
>>> a>2
array([False, False, False, True, True], dtype=bool)
>>> a<4
array([ True, True, True, True, False], dtype=bool)
>>> (a>2) and (a<4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
The error message is a bit confusing. It can be traced back to the fact that python would try to come up with a single result for array1 and array2
(which in native python would return either array based on their emptiness). However, numpy suspects that this is not what you want to do, and resists the temptation to guess.
Since you want your function to operate element-wise on the two boolean arrays (which come from the comparison operation), you'll have to use the &
operator. This is "binary and" in native python, but for numpy arrays this gives you the element-wise "logical and" of the arrays. You could also use numpy.logical_and
(or in your case scipy.logical_and
) to be more explicit:
>>> (a>2) & (a<4)
array([False, False, False, True, False], dtype=bool)
>>> np.logical_and(a>2,a<4)
array([False, False, False, True, False], dtype=bool)
Note that for the &
case you always have to parenthesize your comparisons, since again a>2&a<4
would be ambiguous (to the programmer) and wrong (considering what you want it to do). Since the "binary and" of booleans will behave exactly as you'd expect it, it is safe to rewrite your function to use &
instead of and
for comparing the two comparisons.
However, there is still one step you'll need to change: in case of ndarray
inputs, the if
will also behave differently. Python can't help but make a single choice in an if
, which is also true if you put an array into it. But what you actually want to do is to constrain the elements of your output element-wise (again). So you either have to loop over your array (don't), or do this choice again in a vectorized way. The latter is idiomatic using numpy/scipy:
import scipy as sp
def uniDist(x, a, b):
return sp.where((a<=x) & (x<=b), 1.0/(b-a), 0.0)
This (namely numpy.where
) will return an array the same size as x
. For the elements where the condition is True
, the value of the output will be 1/(b-a)
. For the rest the output is 0
. For scalar x
, the return value is a numpy scalar. Note that I removed the float
conversion in the above example, since having 1.0
in the numerator will definitely give you true division, despite your using python 2. Although I would suggest using python 3, or at least from __future__ import division
.
Minor note: even for a scalar case I would suggest making use of python's operator chaining for the comparison, which lends itself to this purpose. What I mean is that you can simply do if a <= x <= b: ...
, and unlike most languages, this will be functionally equivalent to what you wrote (but prettier).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With