We can create the ECDF with <pre class="prettyprint"><code>import numpy as np from statsmodels.distributions.empirical_distribution import ECDF ecdf = ECDF([3, 3, 1, 4]) </code></pre> and obtain then ECDF at point with <pre class="prettyprint"><code>ecdf(x) </code></pre> However, what if I want to know the x for percentile 97.5% ? From <code>http://www.statsmodels.org/stable/generated/statsmodels.distributions.empirical_distribution.ECDF.html?highlight=ecdf</code>, it seems like not been implemented. Is there any way to do this? Or any other libraries?

Since the empirical CDF just places mass of 1/n at each data point, the 97.5th quantile is just the data point that is bigger than 97.5% of all the other points. To find this value, you can simply sort the data in ascending order and find the 0.975n-th largest value. <pre class="prettyprint"><code>sample = [1, 5, 2, 10, -19, 4, 7, 2, 0, -1] n = len(sample) sort = sorted(sample) print sort[int(n * 0.975)] </code></pre> Which produces: <pre class="prettyprint"><code>10 </code></pre> Since we remember than for discrete distributions (like the empirical cdf), the quantile function is defined here , we realize that we have to take the 0.975n-th (rounded up) largest value.

<code>numpy.quantile(x, q=.975)</code> will return the value along array x at which has ecdf 0.975. Similarly, there is <code>pandas.quantile(q=0.97)</code> for Series/DataFrames.

Python: inverse empirical cumulative distribution function (ECDF)?

Tags:

python

numpy

statsmodels

We can create the ECDF with

Click to copy

import numpy as np
from statsmodels.distributions.empirical_distribution import ECDF
ecdf = ECDF([3, 3, 1, 4])

and obtain then ECDF at point with

Click to copy

ecdf(x)

However, what if I want to know the x for percentile 97.5% ?

From http://www.statsmodels.org/stable/generated/statsmodels.distributions.empirical_distribution.ECDF.html?highlight=ecdf, it seems like not been implemented.

Is there any way to do this? Or any other libraries?

318

asked May 23 '17 10:05

cqcn1991

3 Answers

Since the empirical CDF just places mass of 1/n at each data point, the 97.5th quantile is just the data point that is bigger than 97.5% of all the other points. To find this value, you can simply sort the data in ascending order and find the 0.975n-th largest value.

Click to copy

sample = [1, 5, 2, 10, -19, 4, 7, 2, 0, -1]
n = len(sample)
sort = sorted(sample)
print sort[int(n * 0.975)]

Which produces:

Click to copy

Since we remember than for discrete distributions (like the empirical cdf), the quantile function is defined here , we realize that we have to take the 0.975n-th (rounded up) largest value.

149

answered Oct 20 '22 22:10

Benjamin Doughty

This is my suggestion. Linear interpolation because dfs are only effectively estimated from fairly large samples anyway. The interpolating line segments can be obtained because their endpoints occur at distinct values in the sample.

Click to copy

import statsmodels.distributions.empirical_distribution as edf
from scipy.interpolate import interp1d
import numpy as np
import matplotlib.pyplot as plt

sample = [1,4,2,6,5,5,3,3,5,7]
sample_edf = edf.ECDF(sample)

slope_changes = sorted(set(sample))

sample_edf_values_at_slope_changes = [ sample_edf(item) for item in slope_changes]
inverted_edf = interp1d(sample_edf_values_at_slope_changes, slope_changes)

x = np.linspace(0.1, 1)
y = inverted_edf(x)
plt.plot(x, y, 'ro', x, y, 'b-')
plt.show()

print ('97.5 percentile:', inverted_edf(0.975))

It produces the following output,

Click to copy

97.5 percentile: 6.75

and this graph. inverted empirical cdf

answered Oct 20 '22 22:10

Bill Bell

numpy.quantile(x, q=.975) will return the value along array x at which has ecdf 0.975.

Similarly, there is pandas.quantile(q=0.97) for Series/DataFrames.

answered Oct 20 '22 22:10

mathsmodel

Related questions
                            
                                os.walk very slow, any way to optimise?
                            
                                Run Web app with Bokeh plots in an offline mode? Where to dl Required Bokeh files
                            
                                python converting video to audio
                            
                                Convert Pandas dataframe to list of list with index, data, and columns
                            
                                To replace but the last occurrence of string in a text [duplicate]
                            
                                Fastest way to find Indexes of item in list?
                            
                                How to filter a Spark dataframe by a boolean column?
                            
                                How to use Keras' multi layer perceptron for multi-class classification
                            
                                How to remove dates from a list in Python
                            
                                Can you have required keyword arguments in Javascript or Python?
                            
                                Speedup GPU vs CPU for matrix operations
                            
                                Pywinauto: unable to bring window to foreground
                            
                                ImportError: No module named 'queue' while running my app freezed with cx_freeze
                            
                                How to parse binary string to dict ?
                            
                                pip and pip3 - both pointing to python3.5?
                            
                                import my database connection with python
                            
                                append items from shuffled list to a new list
                            
                                How to calculate the coordinates of the line between two points in python?
                            
                                Pandas - group by id and drop duplicate with threshold
                            
                                Change directory on server before uploading files with ftplib in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python: inverse empirical cumulative distribution function (ECDF)?

Tags:

python

numpy

statsmodels

cqcn1991

People also ask

3 Answers

Benjamin Doughty

Bill Bell

mathsmodel

Recent Activity

Donate For Us