Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ignoring null values in in a postgresql rank() window function

Tags:

sql

postgresql

I am writing a SQL query using PostgreSQL that needs to rank people that "arrive" at some location. Not everyone arrives however. I am using a rank() window function to generate arrival ranks, but in the places where the arrival time is null, rather than returning a null rank, the rank() aggregate function just treats them as if they arrived after everyone else. What I want to happen is that these no-shows get a rank of NULL instead of this imputed rank.

Here is an example. Suppose I have a table dinner_show_up that looks like this:

 | Person | arrival_time | Restaurant |
 +--------+--------------+------------+
 | Dave   |     7        | in_and_out |
 | Mike   |     2        | in_and_out | 
 | Bob    |   NULL       | in_and_out | 

Bob never shows up. The query I'm writing would be:

select Person, 
       rank() over (partition by Restaurant order by arrival_time asc) 
               as arrival_rank
from dinner_show_up; 

And the result will be

 | Person | arrival_rank | 
 +--------+--------------+
 | Dave   |     2        | 
 | Mike   |     1        | 
 | Bob    |     3        |  

What I want to happen instead is this:

 | Person | arrival_rank | 
 +--------+--------------+
 | Dave   |     2        | 
 | Mike   |     1        | 
 | Bob    |     NULL     |  
like image 252
John Horton Avatar asked Aug 12 '13 18:08

John Horton


2 Answers

Just use a case statement around the rank():

select Person, 
       (case when arrival_time is not null
             then rank() over (partition by Restaurant order by arrival_time asc) 
        end) as arrival_rank
from dinner_show_up; 
like image 94
Gordon Linoff Avatar answered Nov 18 '22 10:11

Gordon Linoff


A more general solution for all aggregate functions, not only rank(), is to partition by 'arrival_time is not null' in the over() clause. That will cause all null arrival_time rows to be placed into the same group and given the same rank, leaving the non-null rows to be ranked relative only to each other.

For the sake of a meaningful example, I mocked up a CTE having more rows than the intial problem set. Please forgive the wide rows, but I think they better contrast the differing techniques.

with dinner_show_up("person", "arrival_time", "restaurant") as (values
   ('Dave' ,    7, 'in_and_out')
  ,('Mike' ,    2, 'in_and_out')
  ,('Bob'  , null, 'in_and_out')
  ,('Peter',    3, 'in_and_out')
  ,('Jane' , null, 'in_and_out')
  ,('Merry',    5, 'in_and_out')
  ,('Sam'  ,    5, 'in_and_out')
  ,('Pip'  ,    9, 'in_and_out')
)

select 
   person
  ,case when arrival_time is not null then         rank() over (                                      order by arrival_time) end as arrival_rank_without_partition
  ,case when arrival_time is not null then         rank() over (partition by arrival_time is not null order by arrival_time) end as arrival_rank_with_partition
  ,case when arrival_time is not null then percent_rank() over (                                      order by arrival_time) end as arrival_pctrank_without_partition
  ,case when arrival_time is not null then percent_rank() over (partition by arrival_time is not null order by arrival_time) end as arrival_pctrank_with_partition
from dinner_show_up

This query gives the same results for arrival_rank_with/without_partition. However, the results for percent_rank() do differ: without_partition is wrong, ranging from 0% to 71.4%, whereas with_partition correctly gives pctrank() ranging from 0% to 100%.

This same pattern applies to the ntile() aggregate function, as well.

It works by separating all null values from non-null values for purposes of the ranking. This ensures that Jane and Bob are excluded from the percentile ranking of 0% to 100%.

 |person|arrival_rank_without_partition|arrival_rank_with_partition|arrival_pctrank_without_partition|arrival_pctrank_with_partition|
 +------+------------------------------+---------------------------+---------------------------------+------------------------------+
 |Jane  |null                          |null                       |null                             |null                          |
 |Bob   |null                          |null                       |null                             |null                          |
 |Mike  |1                             |1                          |0                                |0                             |
 |Peter |2                             |2                          |0.14                             |0.2                           |
 |Sam   |3                             |3                          |0.28                             |0.4                           |
 |Merry |4                             |4                          |0.28                             |0.4                           |
 |Dave  |5                             |5                          |0.57                             |0.8                           |
 |Pip   |6                             |6                          |0.71                             |1.0                           |
like image 10
pbnelson Avatar answered Nov 18 '22 10:11

pbnelson