I want to know the distribution of my data points, so first I plotted the histogram of my data. My histogram looks like the following: <img src="https://i.stack.imgur.com/HQOYp.png" alt="my histogram"> Second, in order to fit them to a distribution, here's the code I wrote: <pre class="prettyprint"><code>size = 20000 x = scipy.arange(size) # fit param = scipy.stats.gamma.fit(y) pdf_fitted = scipy.stats.gamma.pdf(x, *param[:-2], loc = param[-2], scale = param[-1]) * size plt.plot(pdf_fitted, color = 'r') # plot the histogram plt.hist(y) plt.xlim(0, 0.3) plt.show() </code></pre> The result is: <img src="https://i.stack.imgur.com/o5v1E.png" alt="enter image description here"> What am I doing wrong?

Your data does not appear to be gamma-distributed, but assuming it is, you could fit it like this: <pre class="prettyprint"><code>import numpy as np import scipy.stats as stats import matplotlib.pyplot as plt gamma = stats.gamma a, loc, scale = 3, 0, 2 size = 20000 y = gamma.rvs(a, loc, scale, size=size) x = np.linspace(0, y.max(), 100) # fit param = gamma.fit(y, floc=0) pdf_fitted = gamma.pdf(x, *param) plt.plot(x, pdf_fitted, color='r') # plot the histogram plt.hist(y, normed=True, bins=30) plt.show() </code></pre> <img src="https://i.stack.imgur.com/tgrEP.png" alt="enter image description here"> <ul> <li>The area under the pdf (over the entire domain) equals 1. The area under the histogram equals 1 if you use <code>normed=True</code>.</li> <li><code>x</code> has length <code>size</code> (i.e. 20000), and <code>pdf_fitted</code> has the same shape as <code>x</code>. If we call <code>plot</code> and specify only the y-values, e.g. <code>plt.plot(pdf_fitted)</code>, then values are plotted over the x-range <code>[0, size]</code>. That is much too large an x-range. Since the histogram is going to use an x-range of <code>[min(y), max(y)]</code>, we much choose <code>x</code> to span a similar range: <code>x = np.linspace(0, y.max())</code>, and call <code>plot</code> with both the x- and y-values specified, e.g. <code>plt.plot(x, pdf_fitted)</code>.</li> <li>As Warren Weckesser points out in the comments, for most applications you know the gamma distribution's domain begins at 0. If that is the case, use <code>floc=0</code> to hold the <code>loc</code> parameter to 0. Without <code>floc=0</code>, <code>gamma.fit</code> will try to find the best-fit value for the <code>loc</code> parameter too, which given the vagaries of data will generally not be exactly zero.</li> </ul>

Fit a distribution to a histogram

Q: How do I make a distribution fit in Excel?

Setting up the dialog box to fit a distributionSelect the XLSTAT / Modeling data / Distribution fitting command (see below). The Distribution fitting dialog box then appears. Select the data on the Excel sheet named Data. In the General tab, select column B in the Data field.

Click to copy

size = 20000
x = scipy.arange(size)
# fit
param = scipy.stats.gamma.fit(y)
pdf_fitted = scipy.stats.gamma.pdf(x, *param[:-2], loc = param[-2], scale = param[-1]) * size
plt.plot(pdf_fitted, color = 'r')

# plot the histogram
plt.hist(y)

plt.xlim(0, 0.3)
plt.show()

The result is:

enter image description here

What am I doing wrong?

536

asked Mar 23 '15 10:03

aloha

1 Answers

Your data does not appear to be gamma-distributed, but assuming it is, you could fit it like this:

Click to copy

import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

gamma = stats.gamma
a, loc, scale = 3, 0, 2
size = 20000
y = gamma.rvs(a, loc, scale, size=size)

x = np.linspace(0, y.max(), 100)
# fit
param = gamma.fit(y, floc=0)
pdf_fitted = gamma.pdf(x, *param)
plt.plot(x, pdf_fitted, color='r')

# plot the histogram
plt.hist(y, normed=True, bins=30)

plt.show()

enter image description here

The area under the pdf (over the entire domain) equals 1. The area under the histogram equals 1 if you use normed=True.
x has length size (i.e. 20000), and pdf_fitted has the same shape as x. If we call plot and specify only the y-values, e.g. plt.plot(pdf_fitted), then values are plotted over the x-range [0, size]. That is much too large an x-range. Since the histogram is going to use an x-range of [min(y), max(y)], we much choose x to span a similar range: x = np.linspace(0, y.max()), and call plot with both the x- and y-values specified, e.g. plt.plot(x, pdf_fitted).
As Warren Weckesser points out in the comments, for most applications you know the gamma distribution's domain begins at 0. If that is the case, use floc=0 to hold the loc parameter to 0. Without floc=0, gamma.fit will try to find the best-fit value for the loc parameter too, which given the vagaries of data will generally not be exactly zero.

135

answered Sep 28 '22 17:09

unutbu

Related questions
                            
                                python: flatten to a list of lists but no more
                            
                                how to get a 2d numpy array from a pandas dataframe? - wrong shape
                            
                                Removing duplicates from a list of numPy arrays
                            
                                Prediction intervals for ARMA.predict
                            
                                Passing Django Database Queryset to Highcharts via JSON
                            
                                Why is my Python app stalled with 'system' / kernel CPU time
                            
                                pyjnius "Class not found" when importing jar file
                            
                                How to accept both dot and comma as a decimal separator with WTForms?
                            
                                Gap Filling Contours / Lines
                            
                                Prediction for RBM in scikit
                            
                                Replacing the inner HTML with BeautifulSoup?
                            
                                get lastweek dates using python?
                            
                                Getting an oauth request token from etrade in Python
                            
                                Error with "len() of unsized object"
                            
                                What should I do when <tr> has rowspan
                            
                                Updating an object inside an array with PyMongo
                            
                                What does @defer.inlineCallbacks mean when I don't need yield return a value?
                            
                                how to store binary file recieved by Flask into postgres
                            
                                What is the time complexity of getting first n largest elements in min heap?
                            
                                pandas: Use if-else to populate new column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fit a distribution to a histogram

Tags:

python

scipy

data-fitting

aloha

People also ask

1 Answers

unutbu

Recent Activity

Donate For Us