I'm new to Python and coming from the R world. I'm trying to fit distributions to sample data using SciPy and having good success. I can make distribution.fit(data)
return sane results. What I've been unable to do is create the goodness of fit statistics which I'm used to with the fitdistrplus
package in R. Is there a common method for comparing "best fit" from a number of different distributions with SciPy?
I'm looking for something like the Kolmogorov-Smirnov test or Cramer-von Mises or Anderson-darling tests
If you want to know the "goodness of fit", use the R squared stat. R squared tells you how much of the observed variance in the outcome is explained by the input. Here is an example in python. This returns 0.801 , so 80.1% percent of the variance in y seems to be explained by x.
What is the Chi-square goodness of fit test? The Chi-square goodness of fit test is a statistical hypothesis test used to determine whether a variable is likely to come from a specified distribution or not. It is often used to evaluate whether sample data is representative of the full population.
Note that the p-value corresponds to a Chi-Square value with n-1 degrees of freedom (dof), where n is the number of different categories. In this case, dof = 5-1 = 4. You can use the Chi-Square to P Value Calculator to confirm that the p-value that corresponds to X2 = 4.36 with dof = 4 is 0.35947.
The chi-square goodness of fit test is a hypothesis test. It allows you to draw conclusions about the distribution of a population based on a sample. Using the chi-square goodness of fit test, you can test whether the goodness of fit is “good enough” to conclude that the population follows the distribution.
See the scipy.stats library: http://docs.scipy.org/doc/scipy/reference/stats.html
It contains K-S and Anderson-Darling, although apparently not Cramer-von Mises.
There's also statmodels goodness of fit tests.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With