Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy percentiles with linear interpolation - wrong value?

The linear interpolation formula for percentiles is:

linear: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j.

Suppose I have this list with 16 observations:

test = [0, 1, 5, 5, 5, 6, 6, 7, 7, 8, 11, 12, 21, 23, 23, 24]

I pass it as a numpy array and calculate the 85th percentile using linear interpolation.

np_test = np.asarray(test)
np.percentile(np_test, 85, interpolation = 'linear')

The result I get is 22.5. However, I don't think that's correct. The index of the 85th percentile is .85 * 16 = 13.6. Thus, the fractional part is .6. The 13th value is 21, so i = 21. The 14th value is 23, so j = 23. The linear formula should then yield:

21 + (23 - 21) * .6 = 21 + 2 * .6 = 21 + 1.2 = 22.2

The correct answer is 22.2. Why am I getting 22.5 instead?

like image 530
jerbear Avatar asked Feb 15 '18 02:02

jerbear


People also ask

How do you use linear interpolation to find percentiles?

Definition 3: Using an Interpolation Approach To calculate an interpolated percentile, do the following: Calculate the rank to use for the percentile. Use: rank = p(n+1), where p = the percentile and n = the sample size. For our example, to find the rank for the 70th percentile, we take 0.7*(11 + 1) = 8.4.

How do you find the 95th percentile in Python?

Note that when using the pandas quantile() function pass the value of the nth percentile as a fractional value. For example, pass 0.95 to get the 95th percentile value.

How to calculate percentile with linear interpolation using NumPy?

- Stack Overflow Numpy percentiles with linear interpolation - wrong value? linear: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j. I pass it as a numpy array and calculate the 85th percentile using linear interpolation.

What is percentile and quantile in NumPy?

Percentile – Percentile method in the numpy module through which we can calculate the nth percentile of the given data (array elements) along the specified axis. Numpy Quantile – Quantile method in the numpy module through which we can calculate the qth quantile of the given data (array elements) along the specified axis.

How to calculate percentile of an array in Python?

Numpy Percentile using axis = 0 in 2-D array We will be using axis = 0 in a 2-D array for calculating the percentile of the array by taking the input array. Here firstly, we have imported the numpy module in python as np. Secondly, we have taken a 2-d array. Thirdly, we have printed the input array.

How to find the minimum and maximum percentile standard deviation in NumPy?

The NumPy has many useful statistical functions to find the minimum and maximum percentile standard deviations and variances from the given set of elements in the mentioned arrays. For each numpy have different set of functions like amin () and amax () that is it calculates the minimum and maximum functions from the elements in the given arrays.


1 Answers

len(test) is 16 but the distance between last element and first element is 1 less, that is, d=16-1=15-0=15. Therefore, index of 85th percentile is d*0.85 = 15*0.85 = 12.75. test[12] = 21 and test[13] = 23. Therefore, using linear interpolation for the fractional part, we get: 21 + 0.75 * (23 - 21) = 22.5. The correct answer is 22.5.

From the Notes section of the documentation of numpy.percentile():

Given a vector V of length N, the q-th percentile of V is the value q/100 of the way from the mimumum to the maximum in in a sorted copy of V.

The key here is, in my opinion, "the way from the minimum to the maximum". Let's say we number elements from 1 to 16. Then the "position" of the first element is 1 and the "position" (along the "coordinate axis of indices") of the last element in test is 16. Therefore the distance between them is 16-1=15.

like image 114
AGN Gazer Avatar answered Oct 13 '22 09:10

AGN Gazer