Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to interpret scipy.stats.probplot results?

I wanted to use scipy.stats.probplot() to perform some gaussianity test on mydata.

from scipy import stats
_,fit=stats.probplot(mydata, dist=stats.norm,plot=ax)
goodness_fit="%.2f" %fit[2]

The documentation says:

Generates a probability plot of sample data against the quantiles of a specified theoretical distribution (the normal distribution by default). probplot optionally calculates a best-fit line for the data and plots the results using Matplotlib or a given plot function. probplot generates a probability plot, which should not be confused with a Q-Q or a P-P plot. Statsmodels has more extensive functionality of this type, see statsmodels.api.ProbPlot.

But if google probability plot, it is a common name for P-P plot, while the documentation says not to confuse the two things.

Now I am confused, what is this function doing?

like image 616
00__00__00 Avatar asked Jan 05 '18 06:01

00__00__00


People also ask

What does stats probplot() do?

probplot. Calculate quantiles for a probability plot, and optionally show the plot. Generates a probability plot of sample data against the quantiles of a specified theoretical distribution (the normal distribution by default).

What does a PP plot show?

In statistics, a P–P plot (probability–probability plot or percent–percent plot or P value plot) is a probability plot for assessing how closely two data sets agree, or for assessing how closely a dataset fits a particular model.

What is the difference between PP plot and QQ plot?

A P-P plot compares the empirical cumulative distribution function of a data set with a specified theoretical cumulative distribution function F(·). A Q-Q plot compares the quantiles of a data distribution with the quantiles of a standardized theoretical distribution from a specified family of distributions.

What is a theoretical quantile?

The theoretical quantile-quantile plot is a tool to explore how a batch of numbers deviates from a theoretical distribution and to visually assess whether the difference is significant for the purpose of the analysis.


1 Answers

I looked since hours for an answer to this question, and this can be found in the Scipy/Statsmodel code comments.

In Scipy, comment at https://github.com/scipy/scipy/blob/abdab61d65dda1591f9d742230f0d1459fd7c0fa/scipy/stats/morestats.py#L523 says:

probplot generates a probability plot, which should not be confused with a Q-Q or a P-P plot. Statsmodels has more extensive functionality of this type, see statsmodels.api.ProbPlot.

So, now, let's look at Statsmodels, where comment at https://github.com/statsmodels/statsmodels/blob/66fc298c51dc323ce8ab8564b07b1b3797108dad/statsmodels/graphics/gofplots.py#L58 says:

ppplot : Probability-Probability plot Compares the sample and theoretical probabilities (percentiles).

qqplot : Quantile-Quantile plot Compares the sample and theoretical quantiles

probplot : Probability plot Same as a Q-Q plot, however probabilities are shown in the scale of the theoretical distribution (x-axis) and the y-axis contains unscaled quantiles of the sample data.

So, difference between QQ plot and Probability plot, in these modules, is related to the scales.

like image 150
mike123 Avatar answered Sep 23 '22 18:09

mike123