I want to pass the numpy percentile()
function through pandas' agg()
function as I do below with various other numpy statistics functions.
Right now I have a dataframe that looks like this:
AGGREGATE MY_COLUMN A 10 A 12 B 5 B 9 A 84 B 22
And my code looks like this:
grouped = dataframe.groupby('AGGREGATE') column = grouped['MY_COLUMN'] column.agg([np.sum, np.mean, np.std, np.median, np.var, np.min, np.max])
The above code works, but I want to do something like
column.agg([np.sum, np.mean, np.percentile(50), np.percentile(95)])
I.e., specify various percentiles to return from agg()
.
How should this be done?
agg() is used to pass a function or list of function to be applied on a series or even each element of series separately. In case of list of function, multiple results are returned by agg() method.
Note that when using the pandas quantile() function pass the value of the nth percentile as a fractional value. For example, pass 0.95 to get the 95th percentile value.
Perhaps not super efficient, but one way would be to create a function yourself:
def percentile(n): def percentile_(x): return np.percentile(x, n) percentile_.__name__ = 'percentile_%s' % n return percentile_
Then include this in your agg
:
In [11]: column.agg([np.sum, np.mean, np.std, np.median, np.var, np.min, np.max, percentile(50), percentile(95)]) Out[11]: sum mean std median var amin amax percentile_50 percentile_95 AGGREGATE A 106 35.333333 42.158431 12 1777.333333 10 84 12 76.8 B 36 12.000000 8.888194 9 79.000000 5 22 12 76.8
Note sure this is how it should be done though...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With