Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does numpy's percentile function do exactly?

Tags:

python

numpy

From my understanding, numpy's percentile compute the qth percentiles of the data.

But how does it do exactly?


Say, given x = np.array([1.3, 1.7, 2.4, 2.8, 3.5, 5.6, 6.6, 7.7, 8.8, 9.9]) (10 floats inside).

if I do np.percentile(x, 100), it gives back 9.9000000000000004.

if I do np.percentile(x, 90), it should returns 8.8, right? But it gives back 8.9100000000000001.


Why there are such diffs? Are these diffs acceptable?

like image 689
Jackson Tale Avatar asked Oct 26 '15 11:10

Jackson Tale


2 Answers

Since version 1.9.0, Numpy's percentile function has an interpolation parameter which is described in the docs like this:

interpolation : {‘linear’, ‘lower’, ‘higher’, ‘midpoint’, ‘nearest’}
This optional parameter specifies the interpolation method to use, when the desired quantile lies between two data points i and j:

  • linear: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j.
  • lower: i.
  • higher: j.
  • nearest: i or j whichever is nearest.
  • midpoint: (i + j) / 2.

It defaults to linear. If you want to get 8.8 from your example, run:

np.percentile(x, 90, interopolation='lower')
like image 50
Carsten Avatar answered Nov 13 '22 06:11

Carsten


From my understanding, the 90%-percentile does not have to be an item from the input array.

From the documentation:

Given a vector V of length N, the q-th percentile of V is the q-th ranked value in a sorted copy of V. The values and distances of the two nearest neighbors as well as the interpolation parameter will determine the percentile if the normalized ranking does not match q exactly. This function is the same as the median if q=50, the same as the minimum if q=0 and the same as the maximum if q=100.

The issue with float representation (which is responsible for the slight difference in np.percentile(x, 100) compared to 9.9) is well known.

like image 40
jkalden Avatar answered Nov 13 '22 06:11

jkalden