Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating percentile rank in MySQL

Tags:

I have a very big table of measurement data in MySQL and I need to compute the percentile rank for each and every one of these values. Oracle appears to have a function called percent_rank but I can't find anything similar for MySQL. Sure I could just brute-force it in Python which I use anyways to populate the table but I suspect that would be quite inefficient because one sample might have 200.000 observations.

like image 632
lhahne Avatar asked Jun 29 '09 07:06

lhahne


People also ask

How does mysql calculate percentile rank?

The PERCENT_RANK() function returns a number that ranges from zero to one. In this formula, rank is the rank of a specified row and total_rows is the number of rows being evaluated. The PERCENT_RANK() function always returns zero for the first row in a partition or result set.

How do you find percentile rank in SQL?

The PERCENT_RANK() function returns a percentile ranking number which ranges from zero to one. In this formula, rank is the rank of the row. total_rows is the number of rows that are being evaluated. Based on this formula, the PERCENT_RANK() function always returns zero for the first row the result set.

How is 90th percentile calculated in mysql?

Get count of values: SELECT COUNT(*) AS cnt FROM t. Get nth value, where n = (cnt - 1) * (1 - 0.95) : SELECT k FROM t ORDER BY k DESC LIMIT n,1.

How is 95 percentile calculated in SQL?

Ntile is where the data is divided into that "tile" where we can think of the tile having a size, and all those sizes being the same for each tile. For your 95th percentile, you want the place where the data is divided for the 95th time. That would be the START of the 95th percentile or the MIN, not the MAX.


1 Answers

Here's a different approach that doesn't require a join. In my case (a table with 15,000+) rows, it runs in about 3 seconds. (The JOIN method takes an order of magnitude longer).

In the sample, assume that measure is the column on which you're calculating the percent rank, and id is just a row identifier (not required):

SELECT     id,     @prev := @curr as prev,     @curr := measure as curr,     @rank := IF(@prev > @curr, @rank+@ties, @rank) AS rank,     @ties := IF(@prev = @curr, @ties+1, 1) AS ties,     (1-@rank/@total) as percentrank FROM     mytable,     (SELECT         @curr := null,         @prev := null,         @rank := 0,         @ties := 1,         @total := count(*) from mytable where measure is not null     ) b WHERE     measure is not null ORDER BY     measure DESC 

Credit for this method goes to Shlomi Noach. He writes about it in detail here:

http://code.openark.org/blog/mysql/sql-ranking-without-self-join

I've tested this in MySQL and it works great; no idea about Oracle, SQLServer, etc.

like image 67
mattstuehler Avatar answered Oct 06 '22 02:10

mattstuehler