I wanted to use scipy.stats.probplot()
to perform some gaussianity test on mydata
.
from scipy import stats
_,fit=stats.probplot(mydata, dist=stats.norm,plot=ax)
goodness_fit="%.2f" %fit[2]
The documentation says:
Generates a probability plot of sample data against the quantiles of a specified theoretical distribution (the normal distribution by default). probplot optionally calculates a best-fit line for the data and plots the results using Matplotlib or a given plot function. probplot generates a probability plot, which should not be confused with a Q-Q or a P-P plot. Statsmodels has more extensive functionality of this type, see statsmodels.api.ProbPlot.
But if google probability plot, it is a common name for P-P plot, while the documentation says not to confuse the two things.
Now I am confused, what is this function doing?
probplot. Calculate quantiles for a probability plot, and optionally show the plot. Generates a probability plot of sample data against the quantiles of a specified theoretical distribution (the normal distribution by default).
In statistics, a P–P plot (probability–probability plot or percent–percent plot or P value plot) is a probability plot for assessing how closely two data sets agree, or for assessing how closely a dataset fits a particular model.
A P-P plot compares the empirical cumulative distribution function of a data set with a specified theoretical cumulative distribution function F(·). A Q-Q plot compares the quantiles of a data distribution with the quantiles of a standardized theoretical distribution from a specified family of distributions.
The theoretical quantile-quantile plot is a tool to explore how a batch of numbers deviates from a theoretical distribution and to visually assess whether the difference is significant for the purpose of the analysis.
I looked since hours for an answer to this question, and this can be found in the Scipy/Statsmodel code comments.
In Scipy, comment at https://github.com/scipy/scipy/blob/abdab61d65dda1591f9d742230f0d1459fd7c0fa/scipy/stats/morestats.py#L523 says:
probplot
generates a probability plot, which should not be confused with a Q-Q or a P-P plot. Statsmodels has more extensive functionality of this type, seestatsmodels.api.ProbPlot
.
So, now, let's look at Statsmodels, where comment at https://github.com/statsmodels/statsmodels/blob/66fc298c51dc323ce8ab8564b07b1b3797108dad/statsmodels/graphics/gofplots.py#L58 says:
ppplot : Probability-Probability plot Compares the sample and theoretical probabilities (percentiles).
qqplot : Quantile-Quantile plot Compares the sample and theoretical quantiles
probplot : Probability plot Same as a Q-Q plot, however probabilities are shown in the scale of the theoretical distribution (x-axis) and the y-axis contains unscaled quantiles of the sample data.
So, difference between QQ plot and Probability plot, in these modules, is related to the scales.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With