Need to calculate the percentile rank (1st - 99th percentile) for each student with a score for a single test.
I'm a little confused by the msdn definition of NTILE, because it does not explicitly mention percentile rank. I need some sort of assurance that NTILE is the correct keyword to use for calculating percentile rank.
declare @temp table
(
StudentId int,
Score int
)
insert into @temp
select 1, 20
union
select 2, 25
.....
select NTILE(100) OVER (order by Score) PercentileRank
from @temp
It looks correct to me, but is this the correct way to calculate percentile rank?
NTILE is absolutely NOT the same as percentile rank. NTILE simply divides up a set of data evenly by the number provided (as noted by RoyiNamir above). If you chart the results of both functions, NTILE will be a perfectly linear line from 1-to-n, whereas percentile rank will [usually] have some curves to it depending on your data.
Percentile rank is much more complicated than simply dividing it up by N. It then takes each row's number and figures out where in the distribution it lies, interpolating when necessary (which is very CPU intensive). I have an Excel sheet of 525,000 rows and it dominates my 8-core machine's CPU at 100% for 15-20 minutes just to figure out the PERCENTRANK function for a single column.
Is there a typo?
select NTILE(100) OVER (order by Score) PercentileRank
from @temp
And your script looks good. If you think something wrong there, could you clarify what excactly?
One way to think of this is, "the percentage of Students with Scores below this one."
Here is one way to get that type of percentile in SQL Server, using RANK()
:
select *
, (rank() over (order by Score) - 1.0) / (select count(*) from @temp) * 100 as PercentileRank
from @temp
Note that this will always be less than 100% unless you round up, and you will always get 0% for the lowest value(s). This does not necessarily put the median value at 50%, nor will it interpolate like some percentile calculations do.
Feel free to round or cast the whole expression (e.g. cast(... as decimal(4,2))
) for good looking reports, or even replace - 1.0
with - 1e
to force floating point calculation.
NTILE()
isn't really what you're looking for in this case because it essentially divides the row numbers of an ordered set into groups rather than the values. It will assign a different percentile to two instances of the same value if those instances happen to straddle a crossover point. You'd have to then additionally group by that value and grab the max or min percentile of the group to use NTILE()
in the same way as we're doing with RANK()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With