Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

T-SQL: Calculating the Nth Percentile Value from column

Tags:

sql

tsql

I have a column of data, some of which are NULL values, from which I wish to extract the single 90th percentile value:

ColA
-----
NULL
100
200
300
NULL
400
500
600
700
800
900
1000

For the above, I am looking for a technique which returns the value 900 when searching for the 90th percentile, 800 for the 80th percentile, etc. An analogous function would be AVG(ColA) which returns 550 for the above data, or MIN(ColA) which returns 100, etc.

Any suggestions?

like image 755
jbeldock Avatar asked Aug 10 '12 17:08

jbeldock


People also ask

How do you find the nth percentile of data?

Percentiles can be calculated using the formula n = (P/100) x N, where P = percentile, N = number of values in a data set (sorted from smallest to largest), and n = ordinal rank of a given value. Percentiles are frequently used to understand test scores and biometric measurements.

How do you find the percentile in SQL query?

PERCENT_RANK() The PERCENT_RANK function in SQL Server calculates the relative rank SQL Percentile of each row. It always returns values greater than 0, and the highest value is 1. It does not count any NULL values.

How do you find the 50th percentile in SQL?

For example, PERCENTILE_DISC (0.5) will compute the 50th percentile (that is, the median) of an expression. PERCENTILE_DISC calculates the percentile based on a discrete distribution of the column values. The result is equal to a specific column value.

How is 95 percentile calculated in SQL?

Ntile is where the data is divided into that "tile" where we can think of the tile having a size, and all those sizes being the same for each tile. For your 95th percentile, you want the place where the data is divided for the 95th time. That would be the START of the 95th percentile or the MIN, not the MAX.


1 Answers

If you want to get exactly the 90th percentile value, excluding NULLs, I would suggest doing the calculation directly. The following version calculates the row number and number of rows, and selects the appropriate value:

select max(case when rownum*1.0/numrows <= 0.9 then colA end) as percentile_90th
from (select colA,
             row_number() over (order by colA) as rownum,
             count(*) over (partition by NULL) as numrows
      from t
      where colA is not null
     ) t

I put the condition in the SELECT clause rather than the WHERE clause, so you can easily get the 50th percentile, 17th, or whatever values you want.

like image 140
Gordon Linoff Avatar answered Oct 21 '22 04:10

Gordon Linoff