I have a very big table of measurement data in MySQL and I need to compute the percentile rank for each and every one of these values. Oracle appears to have a function called percent_rank but I can't find anything similar for MySQL. Sure I could just brute-force it in Python which I use anyways to populate the table but I suspect that would be quite inefficient because one sample might have 200.000 observations.
The PERCENT_RANK() function returns a number that ranges from zero to one. In this formula, rank is the rank of a specified row and total_rows is the number of rows being evaluated. The PERCENT_RANK() function always returns zero for the first row in a partition or result set.
The PERCENT_RANK() function returns a percentile ranking number which ranges from zero to one. In this formula, rank is the rank of the row. total_rows is the number of rows that are being evaluated. Based on this formula, the PERCENT_RANK() function always returns zero for the first row the result set.
Get count of values: SELECT COUNT(*) AS cnt FROM t. Get nth value, where n = (cnt - 1) * (1 - 0.95) : SELECT k FROM t ORDER BY k DESC LIMIT n,1.
Ntile is where the data is divided into that "tile" where we can think of the tile having a size, and all those sizes being the same for each tile. For your 95th percentile, you want the place where the data is divided for the 95th time. That would be the START of the 95th percentile or the MIN, not the MAX.
Here's a different approach that doesn't require a join. In my case (a table with 15,000+) rows, it runs in about 3 seconds. (The JOIN method takes an order of magnitude longer).
In the sample, assume that measure is the column on which you're calculating the percent rank, and id is just a row identifier (not required):
SELECT id, @prev := @curr as prev, @curr := measure as curr, @rank := IF(@prev > @curr, @rank+@ties, @rank) AS rank, @ties := IF(@prev = @curr, @ties+1, 1) AS ties, (1-@rank/@total) as percentrank FROM mytable, (SELECT @curr := null, @prev := null, @rank := 0, @ties := 1, @total := count(*) from mytable where measure is not null ) b WHERE measure is not null ORDER BY measure DESC
Credit for this method goes to Shlomi Noach. He writes about it in detail here:
http://code.openark.org/blog/mysql/sql-ranking-without-self-join
I've tested this in MySQL and it works great; no idea about Oracle, SQLServer, etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With