How would you create a qq-plot using Python? Assuming that you have a large set of measurements and are using some plotting function that takes XY-values as input. The function should plot the quantiles of the measurements against the corresponding quantiles of some distribution (normal, uniform...). The resulting plot lets us then evaluate in our measurement follows the assumed distribution or not. http://en.wikipedia.org/wiki/Quantile-quantile_plot Both R and Matlab provide ready made functions for this, but I am wondering what the cleanest method for implementing in in Python would be.

Update: As folks have pointed out this answer is not correct. A probplot is different from a quantile-quantile plot. Please see those comments and other answers before you make an error in interpreting or conveying your distributions' relationship. I think that <code>scipy.stats.probplot</code> will do what you want. See the documentation for more detail. <pre class="prettyprint"><code>import numpy as np import pylab import scipy.stats as stats measurements = np.random.normal(loc = 20, scale = 5, size=100) stats.probplot(measurements, dist="norm", plot=pylab) pylab.show() </code></pre> Result <img src="https://i.stack.imgur.com/8ZbV4.png" alt="enter image description here">

I came up with this. Maybe you can improve it. Especially the method of generating the quantiles of the distribution seems cumbersome to me. You could replace <code>np.random.normal</code> with any other distribution from <code>np.random</code> to compare data against other distributions. <pre class="prettyprint"><code>#!/bin/python import numpy as np measurements = np.random.normal(loc = 20, scale = 5, size=100000) def qq_plot(data, sample_size): qq = np.ones([sample_size, 2]) np.random.shuffle(data) qq[:, 0] = np.sort(data[0:sample_size]) qq[:, 1] = np.sort(np.random.normal(size = sample_size)) return qq print qq_plot(measurements, 1000) </code></pre>

Quantile-Quantile Plot using SciPy

Tags:

python

statistics

scipy

How would you create a qq-plot using Python?

Assuming that you have a large set of measurements and are using some plotting function that takes XY-values as input. The function should plot the quantiles of the measurements against the corresponding quantiles of some distribution (normal, uniform...).

The resulting plot lets us then evaluate in our measurement follows the assumed distribution or not.

http://en.wikipedia.org/wiki/Quantile-quantile_plot

Both R and Matlab provide ready made functions for this, but I am wondering what the cleanest method for implementing in in Python would be.

577

asked Dec 13 '12 17:12

John

4 Answers

Update: As folks have pointed out this answer is not correct. A probplot is different from a quantile-quantile plot. Please see those comments and other answers before you make an error in interpreting or conveying your distributions' relationship.

I think that scipy.stats.probplot will do what you want. See the documentation for more detail.

import numpy as np  import pylab  import scipy.stats as stats  measurements = np.random.normal(loc = 20, scale = 5, size=100)    stats.probplot(measurements, dist="norm", plot=pylab) pylab.show()

Result

enter image description here

105

answered Sep 22 '22 07:09

Geoff

Using qqplot of statsmodels.api is another option:

Very basic example:

import numpy as np import statsmodels.api as sm import pylab  test = np.random.normal(0,1, 1000)  sm.qqplot(test, line='45') pylab.show()

Result:

enter image description here

Documentation and more example are here

answered Sep 21 '22 07:09

Akavall

If you need to do a QQ plot of one sample vs. another, statsmodels includes qqplot_2samples(). Like Ricky Robinson in a comment above, this is what I think of as a QQ plot vs a probability plot which is a sample against a theoretical distribution.

http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot_2samples.html

answered Sep 23 '22 07:09

ccap

I came up with this. Maybe you can improve it. Especially the method of generating the quantiles of the distribution seems cumbersome to me.

You could replace np.random.normal with any other distribution from np.random to compare data against other distributions.

#!/bin/python

import numpy as np

measurements = np.random.normal(loc = 20, scale = 5, size=100000)

def qq_plot(data, sample_size):
    qq = np.ones([sample_size, 2])
    np.random.shuffle(data)
    qq[:, 0] = np.sort(data[0:sample_size])
    qq[:, 1] = np.sort(np.random.normal(size = sample_size))
    return qq

print qq_plot(measurements, 1000)

answered Sep 24 '22 07:09

John

Related questions
                            
                                Understanding lambda in python and using it to pass multiple arguments
                            
                                Parsing non-zero padded timestamps in Python
                            
                                Full examples of using pySerial package [closed]
                            
                                Python, what's the Enum type good for? [duplicate]
                            
                                Implementing use of 'with object() as f' in custom class in python
                            
                                How to locate and insert a value in a text box (input) using Python Selenium?
                            
                                Python Pandas: Convert ".value_counts" output to dataframe
                            
                                RuntimeError: This event loop is already running in python
                            
                                `if key in dict` vs. `try/except` - which is more readable idiom?
                            
                                Pythonic type hints with pandas?
                            
                                Combine two pandas Data Frames (join on a common column)
                            
                                Django Setup Default Logging
                            
                                Convert Python dictionary to JSON array
                            
                                python: Appending a dictionary to a list - I see a pointer like behavior
                            
                                secret key not set in flask session, using the Flask-Session extension
                            
                                Pandas: rolling mean by time interval
                            
                                how to convert a string date into datetime format in python? [duplicate]
                            
                                Jupyter notebook not trusted
                            
                                How should I declare default values for instance variables in Python?
                            
                                How to read file with space separated values in pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Quantile-Quantile Plot using SciPy

Tags:

python

statistics

scipy

John

People also ask

4 Answers

Geoff

Akavall

ccap

John

Recent Activity

Donate For Us