Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

nth percentile calculations in postgresql

I've been surprisingly unable to find an nth percentile function for postgresql.

I am using this via mondrian olap tool so i just need an aggregate function which returns a 95th percentile.

I did find this link:

http://www.postgresql.org/message-id/[email protected]

But for some reason the code in that percentile function is returning nulls in some cases with certain queries. I've checked the data and there's nothing odd in the data that would seem to cause that!

like image 207
Codek Avatar asked Jan 14 '13 10:01

Codek


People also ask

How do you find the nth percentile?

Percentiles can be calculated using the formula n = (P/100) x N, where P = percentile, N = number of values in a data set (sorted from smallest to largest), and n = ordinal rank of a given value. Percentiles are frequently used to understand test scores and biometric measurements.

How do you find the 75th percentile of data?

The interquartile range of a set of scores is the difference between the third and first quartile - that is, the difference between the 75th and 25th percentiles. The 75th percentile is between 78 and 86, so, if 41 is subtracted from those numbers, the upper and lower bounds of the 25th percentile can be found.

How do I calculate percentage in postgresql?

Calculating the “percentage of the total” for each row with Postgres can be done with a window function: SELECT *, (value / SUM(value) OVER ()) AS "% of total" FROM transactions WHERE quarter = '2015-03-31' and company_id = 1; We're using “OVER ()”, which means the sum over all rows returned by the where clause.


2 Answers

With PostgreSQL 9.4 there is native support for percentiles now, implemented in Ordered-Set Aggregate Functions:

percentile_cont(fraction) WITHIN GROUP (ORDER BY sort_expression)  

continuous percentile: returns a value corresponding to the specified fraction in the ordering, interpolating between adjacent input items if needed

percentile_cont(fractions) WITHIN GROUP (ORDER BY sort_expression) 

multiple continuous percentile: returns an array of results matching the shape of the fractions parameter, with each non-null element replaced by the value corresponding to that percentile

See the documentation for more details: http://www.postgresql.org/docs/current/static/functions-aggregate.html

and see here for some examples: https://github.com/michaelpq/michaelpq.github.io/blob/master/_posts/2014-02-27-postgres-9-4-feature-highlight-within-group.markdown

CREATE TABLE aa AS SELECT generate_series(1,20) AS a; --SELECT 20  WITH subset AS (     SELECT a AS val,         ntile(4) OVER (ORDER BY a) AS tile     FROM aa ) SELECT tile, max(val) FROM subset GROUP BY tile ORDER BY tile;   tile | max ------+-----     1 |   5     2 |  10     3 |  15     4 |  20 (4 rows) 
like image 189
alfonx Avatar answered Sep 24 '22 19:09

alfonx


The ntile function is very useful here. I have a table test_temp:

select * from test_temp  score integer 3 5 2 10 4 8 7 12  select score, ntile(4) over (order by score) as quartile from test_temp;  score    quartile integer  integer 2        1 3        1 4        2 5        2 7        3 8        3 10       4 12       4 

ntile(4) over (order by score) orders the columns by score, splits it into four even groups (if the number divides evenly) and assigns the group number based on the order.

Since I have 8 numbers here, they represent the 0th, 12.5th, 25th, 37.5th, 50th, 62.5th, 75th and 87.5th percentiles. So if I only take the results where the quartile is 2, I'll have the 25th and 37.5th percentiles.

with ranked_test as (     select score, ntile(4) over (order by score) as quartile from temp_test ) select min(score) from ranked_test where quartile = 2 group by quartile; 

returns 4, the third highest number on the list of 8.

If you had a larger table and used ntile(100) the column you filter on would be the percentile, and you could use the same query as above.

like image 26
Mike Avatar answered Sep 22 '22 19:09

Mike