The linear interpolation formula for percentiles is:
linear: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j.
Suppose I have this list with 16 observations:
test = [0, 1, 5, 5, 5, 6, 6, 7, 7, 8, 11, 12, 21, 23, 23, 24]
I pass it as a numpy array and calculate the 85th percentile using linear interpolation.
np_test = np.asarray(test)
np.percentile(np_test, 85, interpolation = 'linear')
The result I get is 22.5. However, I don't think that's correct. The index of the 85th percentile is .85 * 16 = 13.6. Thus, the fractional part is .6. The 13th value is 21, so i = 21. The 14th value is 23, so j = 23. The linear formula should then yield:
21 + (23 - 21) * .6 = 21 + 2 * .6 = 21 + 1.2 = 22.2
The correct answer is 22.2. Why am I getting 22.5 instead?
Definition 3: Using an Interpolation Approach To calculate an interpolated percentile, do the following: Calculate the rank to use for the percentile. Use: rank = p(n+1), where p = the percentile and n = the sample size. For our example, to find the rank for the 70th percentile, we take 0.7*(11 + 1) = 8.4.
Note that when using the pandas quantile() function pass the value of the nth percentile as a fractional value. For example, pass 0.95 to get the 95th percentile value.
- Stack Overflow Numpy percentiles with linear interpolation - wrong value? linear: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j. I pass it as a numpy array and calculate the 85th percentile using linear interpolation.
Percentile – Percentile method in the numpy module through which we can calculate the nth percentile of the given data (array elements) along the specified axis. Numpy Quantile – Quantile method in the numpy module through which we can calculate the qth quantile of the given data (array elements) along the specified axis.
Numpy Percentile using axis = 0 in 2-D array We will be using axis = 0 in a 2-D array for calculating the percentile of the array by taking the input array. Here firstly, we have imported the numpy module in python as np. Secondly, we have taken a 2-d array. Thirdly, we have printed the input array.
The NumPy has many useful statistical functions to find the minimum and maximum percentile standard deviations and variances from the given set of elements in the mentioned arrays. For each numpy have different set of functions like amin () and amax () that is it calculates the minimum and maximum functions from the elements in the given arrays.
len(test)
is 16 but the distance between last element and first element is 1 less, that is, d=16-1=15-0=15
. Therefore, index of 85th percentile is d*0.85 = 15*0.85 = 12.75
. test[12] = 21
and test[13] = 23
. Therefore, using linear interpolation for the fractional part, we get: 21 + 0.75 * (23 - 21) = 22.5
. The correct answer is 22.5.
From the Notes section of the documentation of numpy.percentile()
:
Given a vector V of length N, the q-th percentile of V is the value q/100 of the way from the mimumum to the maximum in in a sorted copy of V.
The key here is, in my opinion, "the way from the minimum to the maximum". Let's say we number elements from 1 to 16. Then the "position" of the first element is 1 and the "position" (along the "coordinate axis of indices") of the last element in test
is 16. Therefore the distance between them is 16-1=15
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With