The linear interpolation formula for percentiles is: <blockquote> linear: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j. </blockquote> Suppose I have this list with 16 observations: <pre class="prettyprint"><code>test = [0, 1, 5, 5, 5, 6, 6, 7, 7, 8, 11, 12, 21, 23, 23, 24] </code></pre> I pass it as a numpy array and calculate the 85th percentile using linear interpolation. <pre class="prettyprint"><code>np_test = np.asarray(test) np.percentile(np_test, 85, interpolation = 'linear') </code></pre> The result I get is 22.5. However, I don't think that's correct. The index of the 85th percentile is .85 * 16 = 13.6. Thus, the fractional part is .6. The 13th value is 21, so i = 21. The 14th value is 23, so j = 23. The linear formula should then yield: <blockquote> 21 + (23 - 21) * .6 = 21 + 2 * .6 = 21 + 1.2 = 22.2 </blockquote> The correct answer is 22.2. Why am I getting 22.5 instead?

<code>len(test)</code> is 16 but the distance between last element and first element is 1 less, that is, <code>d=16-1=15-0=15</code>. Therefore, index of 85th percentile is <code>d*0.85 = 15*0.85 = 12.75</code>. <code>test[12] = 21</code> and <code>test[13] = 23</code>. Therefore, using linear interpolation for the fractional part, we get: <code>21 + 0.75 * (23 - 21) = 22.5</code>. The correct answer is 22.5. From the Notes section of the documentation of <code>numpy.percentile()</code>: <blockquote> Given a vector V of length N, the q-th percentile of V is the value q/100 of the way from the mimumum to the maximum in in a sorted copy of V. </blockquote> The key here is, in my opinion, "the way from the minimum to the maximum". Let's say we number elements from 1 to 16. Then the "position" of the first element is 1 and the "position" (along the "coordinate axis of indices") of the last element in <code>test</code> is 16. Therefore the distance between them is <code>16-1=15</code>.

Numpy percentiles with linear interpolation - wrong value?

Tags:

python

numpy

percentile

linear-interpolation

The linear interpolation formula for percentiles is:

linear: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j.

Suppose I have this list with 16 observations:

test = [0, 1, 5, 5, 5, 6, 6, 7, 7, 8, 11, 12, 21, 23, 23, 24]

I pass it as a numpy array and calculate the 85th percentile using linear interpolation.

np_test = np.asarray(test)
np.percentile(np_test, 85, interpolation = 'linear')

The result I get is 22.5. However, I don't think that's correct. The index of the 85th percentile is .85 * 16 = 13.6. Thus, the fractional part is .6. The 13th value is 21, so i = 21. The 14th value is 23, so j = 23. The linear formula should then yield:

21 + (23 - 21) * .6 = 21 + 2 * .6 = 21 + 1.2 = 22.2

The correct answer is 22.2. Why am I getting 22.5 instead?

530

asked Feb 15 '18 02:02

jerbear

1 Answers

len(test) is 16 but the distance between last element and first element is 1 less, that is, d=16-1=15-0=15. Therefore, index of 85th percentile is d*0.85 = 15*0.85 = 12.75. test[12] = 21 and test[13] = 23. Therefore, using linear interpolation for the fractional part, we get: 21 + 0.75 * (23 - 21) = 22.5. The correct answer is 22.5.

From the Notes section of the documentation of numpy.percentile():

Given a vector V of length N, the q-th percentile of V is the value q/100 of the way from the mimumum to the maximum in in a sorted copy of V.

The key here is, in my opinion, "the way from the minimum to the maximum". Let's say we number elements from 1 to 16. Then the "position" of the first element is 1 and the "position" (along the "coordinate axis of indices") of the last element in test is 16. Therefore the distance between them is 16-1=15.

114

answered Oct 13 '22 09:10

AGN Gazer

Related questions
                            
                                PySpark - Create DataFrame from Numpy Matrix
                            
                                How to match and align two images using SURF features (Python OpenCV )?
                            
                                Python - Regex - from Xpath - TypeError: '_sre.SRE_Match' object is not subscriptable
                            
                                Using a Keras model inside a TF estimator
                            
                                Check if given input is a valid IP or Hostname or something invalid
                            
                                Replace dataframe column negative values with nan, in method chain
                            
                                Where can I find a list of all available ChromeOptions with selenium?
                            
                                Pybind11 for C++ code with inner struct created via static factory method
                            
                                Python 3, Ethereum - how to send ERC20 Tokens?
                            
                                delete_message_batch doesn't really delete messages from SQS queue
                            
                                How to set the color of the circle and the selection dot of a radio button?
                            
                                When would a Python float lose precision when cast to Protobuf/C++ float?
                            
                                Cross-compile extension on Linux for Windows
                            
                                Filling shapefile polygons with a color in matplotlib
                            
                                python enum.Enum _value_ vs value
                            
                                ending a program early, not in a loop?
                            
                                How to divide all rows in a panda Dataframe except for one specific row?
                            
                                Using groupby with expanding and a custom function
                            
                                Sqlalchemy mysql FLOAT precision and length
                            
                                Unsupported operand type(s) for *: map and map

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With