Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I calculate percentiles with python/numpy?

People also ask

How do you find the 95th percentile in Python?

Note that when using the pandas quantile() function pass the value of the nth percentile as a fractional value. For example, pass 0.95 to get the 95th percentile value.

How do you find the percentile of a DataFrame in Python?

To find percentiles of a numeric column in a DataFrame, or the percentiles of a Series in pandas, the easiest way is to use the pandas quantile() function. You can also use the numpy percentile() function.


You might be interested in the SciPy Stats package. It has the percentile function you're after and many other statistical goodies.

percentile() is available in numpy too.

import numpy as np
a = np.array([1,2,3,4,5])
p = np.percentile(a, 50) # return 50th percentile, e.g median.
print p
3.0

This ticket leads me to believe they won't be integrating percentile() into numpy anytime soon.


By the way, there is a pure-Python implementation of percentile function, in case one doesn't want to depend on scipy. The function is copied below:

## {{{ http://code.activestate.com/recipes/511478/ (r1)
import math
import functools

def percentile(N, percent, key=lambda x:x):
    """
    Find the percentile of a list of values.

    @parameter N - is a list of values. Note N MUST BE already sorted.
    @parameter percent - a float value from 0.0 to 1.0.
    @parameter key - optional key function to compute value from each element of N.

    @return - the percentile of the values
    """
    if not N:
        return None
    k = (len(N)-1) * percent
    f = math.floor(k)
    c = math.ceil(k)
    if f == c:
        return key(N[int(k)])
    d0 = key(N[int(f)]) * (c-k)
    d1 = key(N[int(c)]) * (k-f)
    return d0+d1

# median is 50th percentile.
median = functools.partial(percentile, percent=0.5)
## end of http://code.activestate.com/recipes/511478/ }}}

import numpy as np
a = [154, 400, 1124, 82, 94, 108]
print np.percentile(a,95) # gives the 95th percentile

Here's how to do it without numpy, using only python to calculate the percentile.

import math

def percentile(data, perc: int):
    size = len(data)
    return sorted(data)[int(math.ceil((size * perc) / 100)) - 1]

percentile([10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0], 90)
# 9.0
percentile([142, 232, 290, 120, 274, 123, 146, 113, 272, 119, 124, 277, 207], 50)
# 146