Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Quantile-Quantile Plot using SciPy

How would you create a qq-plot using Python?

Assuming that you have a large set of measurements and are using some plotting function that takes XY-values as input. The function should plot the quantiles of the measurements against the corresponding quantiles of some distribution (normal, uniform...).

The resulting plot lets us then evaluate in our measurement follows the assumed distribution or not.

http://en.wikipedia.org/wiki/Quantile-quantile_plot

Both R and Matlab provide ready made functions for this, but I am wondering what the cleanest method for implementing in in Python would be.

like image 577
John Avatar asked Dec 13 '12 17:12

John


People also ask

How do you plot a quantile plot in Python?

A Q-Q plot, short for “quantile-quantile” plot, is often used to assess whether or not a set of data potentially came from some theoretical distribution. In most cases, this type of plot is used to determine whether or not a set of data follows a normal distribution.

What is quantile-quantile plot used for?

The quantile-quantile (q-q) plot is a graphical technique for determining if two data sets come from populations with a common distribution. A q-q plot is a plot of the quantiles of the first data set against the quantiles of the second data set.


4 Answers

Update: As folks have pointed out this answer is not correct. A probplot is different from a quantile-quantile plot. Please see those comments and other answers before you make an error in interpreting or conveying your distributions' relationship.

I think that scipy.stats.probplot will do what you want. See the documentation for more detail.

import numpy as np  import pylab  import scipy.stats as stats  measurements = np.random.normal(loc = 20, scale = 5, size=100)    stats.probplot(measurements, dist="norm", plot=pylab) pylab.show() 

Result

enter image description here

like image 105
Geoff Avatar answered Sep 22 '22 07:09

Geoff


Using qqplot of statsmodels.api is another option:

Very basic example:

import numpy as np import statsmodels.api as sm import pylab  test = np.random.normal(0,1, 1000)  sm.qqplot(test, line='45') pylab.show() 

Result:

enter image description here

Documentation and more example are here

like image 28
Akavall Avatar answered Sep 21 '22 07:09

Akavall


If you need to do a QQ plot of one sample vs. another, statsmodels includes qqplot_2samples(). Like Ricky Robinson in a comment above, this is what I think of as a QQ plot vs a probability plot which is a sample against a theoretical distribution.

http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot_2samples.html

like image 27
ccap Avatar answered Sep 23 '22 07:09

ccap


I came up with this. Maybe you can improve it. Especially the method of generating the quantiles of the distribution seems cumbersome to me.

You could replace np.random.normal with any other distribution from np.random to compare data against other distributions.

#!/bin/python

import numpy as np

measurements = np.random.normal(loc = 20, scale = 5, size=100000)

def qq_plot(data, sample_size):
    qq = np.ones([sample_size, 2])
    np.random.shuffle(data)
    qq[:, 0] = np.sort(data[0:sample_size])
    qq[:, 1] = np.sort(np.random.normal(size = sample_size))
    return qq

print qq_plot(measurements, 1000)
like image 35
John Avatar answered Sep 24 '22 07:09

John