How would you create a qq-plot using Python?
Assuming that you have a large set of measurements and are using some plotting function that takes XY-values as input. The function should plot the quantiles of the measurements against the corresponding quantiles of some distribution (normal, uniform...).
The resulting plot lets us then evaluate in our measurement follows the assumed distribution or not.
http://en.wikipedia.org/wiki/Quantile-quantile_plot
Both R and Matlab provide ready made functions for this, but I am wondering what the cleanest method for implementing in in Python would be.
A Q-Q plot, short for “quantile-quantile” plot, is often used to assess whether or not a set of data potentially came from some theoretical distribution. In most cases, this type of plot is used to determine whether or not a set of data follows a normal distribution.
The quantile-quantile (q-q) plot is a graphical technique for determining if two data sets come from populations with a common distribution. A q-q plot is a plot of the quantiles of the first data set against the quantiles of the second data set.
Update: As folks have pointed out this answer is not correct. A probplot is different from a quantile-quantile plot. Please see those comments and other answers before you make an error in interpreting or conveying your distributions' relationship.
I think that scipy.stats.probplot
will do what you want. See the documentation for more detail.
import numpy as np import pylab import scipy.stats as stats measurements = np.random.normal(loc = 20, scale = 5, size=100) stats.probplot(measurements, dist="norm", plot=pylab) pylab.show()
Result
Using qqplot
of statsmodels.api
is another option:
Very basic example:
import numpy as np import statsmodels.api as sm import pylab test = np.random.normal(0,1, 1000) sm.qqplot(test, line='45') pylab.show()
Result:
Documentation and more example are here
If you need to do a QQ plot of one sample vs. another, statsmodels includes qqplot_2samples(). Like Ricky Robinson in a comment above, this is what I think of as a QQ plot vs a probability plot which is a sample against a theoretical distribution.
http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot_2samples.html
I came up with this. Maybe you can improve it. Especially the method of generating the quantiles of the distribution seems cumbersome to me.
You could replace np.random.normal
with any other distribution from np.random
to compare data against other distributions.
#!/bin/python
import numpy as np
measurements = np.random.normal(loc = 20, scale = 5, size=100000)
def qq_plot(data, sample_size):
qq = np.ones([sample_size, 2])
np.random.shuffle(data)
qq[:, 0] = np.sort(data[0:sample_size])
qq[:, 1] = np.sort(np.random.normal(size = sample_size))
return qq
print qq_plot(measurements, 1000)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With