Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate Percentile Rank using NTILE?

Need to calculate the percentile rank (1st - 99th percentile) for each student with a score for a single test.

I'm a little confused by the msdn definition of NTILE, because it does not explicitly mention percentile rank. I need some sort of assurance that NTILE is the correct keyword to use for calculating percentile rank.

declare @temp table
(
  StudentId int,
  Score int
)
insert into @temp
select 1, 20
union
select 2, 25
.....

select NTILE(100) OVER (order by Score) PercentileRank
from @temp

It looks correct to me, but is this the correct way to calculate percentile rank?

like image 277
O.O Avatar asked Mar 22 '12 19:03

O.O


3 Answers

NTILE is absolutely NOT the same as percentile rank. NTILE simply divides up a set of data evenly by the number provided (as noted by RoyiNamir above). If you chart the results of both functions, NTILE will be a perfectly linear line from 1-to-n, whereas percentile rank will [usually] have some curves to it depending on your data.

Percentile rank is much more complicated than simply dividing it up by N. It then takes each row's number and figures out where in the distribution it lies, interpolating when necessary (which is very CPU intensive). I have an Excel sheet of 525,000 rows and it dominates my 8-core machine's CPU at 100% for 15-20 minutes just to figure out the PERCENTRANK function for a single column.

like image 153
David Storfer Avatar answered Nov 15 '22 10:11

David Storfer


Is there a typo?

select NTILE(100) OVER (order by Score) PercentileRank 
from @temp

And your script looks good. If you think something wrong there, could you clarify what excactly?

like image 25
Andrey Gurinov Avatar answered Nov 15 '22 08:11

Andrey Gurinov


One way to think of this is, "the percentage of Students with Scores below this one."

Here is one way to get that type of percentile in SQL Server, using RANK():

select *
    , (rank() over (order by Score) - 1.0) / (select count(*) from @temp) * 100 as PercentileRank
from @temp

Note that this will always be less than 100% unless you round up, and you will always get 0% for the lowest value(s). This does not necessarily put the median value at 50%, nor will it interpolate like some percentile calculations do.

Feel free to round or cast the whole expression (e.g. cast(... as decimal(4,2))) for good looking reports, or even replace - 1.0 with - 1e to force floating point calculation.

NTILE() isn't really what you're looking for in this case because it essentially divides the row numbers of an ordered set into groups rather than the values. It will assign a different percentile to two instances of the same value if those instances happen to straddle a crossover point. You'd have to then additionally group by that value and grab the max or min percentile of the group to use NTILE() in the same way as we're doing with RANK().

like image 28
Tim Lehner Avatar answered Nov 15 '22 10:11

Tim Lehner