From my understanding, numpy's percentile compute the qth percentiles of the data.
But how does it do exactly?
Say, given x = np.array([1.3, 1.7, 2.4, 2.8, 3.5, 5.6, 6.6, 7.7, 8.8, 9.9])
(10 floats inside).
if I do np.percentile(x, 100)
, it gives back 9.9000000000000004
.
if I do np.percentile(x, 90)
, it should returns 8.8
, right? But it gives back 8.9100000000000001
.
Why there are such diffs? Are these diffs acceptable?
Since version 1.9.0, Numpy's percentile function has an interpolation
parameter which is described in the docs like this:
interpolation : {‘linear’, ‘lower’, ‘higher’, ‘midpoint’, ‘nearest’}
This optional parameter specifies the interpolation method to use, when the desired quantile lies between two data points i and j:
- linear: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j.
- lower: i.
- higher: j.
- nearest: i or j whichever is nearest.
- midpoint: (i + j) / 2.
It defaults to linear. If you want to get 8.8
from your example, run:
np.percentile(x, 90, interopolation='lower')
From my understanding, the 90%-percentile does not have to be an item from the input array.
From the documentation:
Given a vector V of length N, the q-th percentile of V is the q-th ranked value in a sorted copy of V. The values and distances of the two nearest neighbors as well as the interpolation parameter will determine the percentile if the normalized ranking does not match q exactly. This function is the same as the median if q=50, the same as the minimum if q=0 and the same as the maximum if q=100.
The issue with float representation (which is responsible for the slight difference in np.percentile(x, 100)
compared to 9.9
) is well known.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With