I am writing a SQL query using PostgreSQL that needs to rank people that "arrive" at some location. Not everyone arrives however. I am using a rank()
window function to generate arrival ranks, but in the places where the arrival time is null, rather than returning a null rank, the rank()
aggregate function just treats them as if they arrived after everyone else. What I want to happen is that these no-shows get a rank of NULL
instead of this imputed rank.
Here is an example. Suppose I have a table dinner_show_up
that looks like this:
| Person | arrival_time | Restaurant |
+--------+--------------+------------+
| Dave | 7 | in_and_out |
| Mike | 2 | in_and_out |
| Bob | NULL | in_and_out |
Bob never shows up. The query I'm writing would be:
select Person,
rank() over (partition by Restaurant order by arrival_time asc)
as arrival_rank
from dinner_show_up;
And the result will be
| Person | arrival_rank |
+--------+--------------+
| Dave | 2 |
| Mike | 1 |
| Bob | 3 |
What I want to happen instead is this:
| Person | arrival_rank |
+--------+--------------+
| Dave | 2 |
| Mike | 1 |
| Bob | NULL |
Just use a case
statement around the rank()
:
select Person,
(case when arrival_time is not null
then rank() over (partition by Restaurant order by arrival_time asc)
end) as arrival_rank
from dinner_show_up;
A more general solution for all aggregate functions, not only rank(), is to partition by 'arrival_time is not null' in the over() clause. That will cause all null arrival_time rows to be placed into the same group and given the same rank, leaving the non-null rows to be ranked relative only to each other.
For the sake of a meaningful example, I mocked up a CTE having more rows than the intial problem set. Please forgive the wide rows, but I think they better contrast the differing techniques.
with dinner_show_up("person", "arrival_time", "restaurant") as (values
('Dave' , 7, 'in_and_out')
,('Mike' , 2, 'in_and_out')
,('Bob' , null, 'in_and_out')
,('Peter', 3, 'in_and_out')
,('Jane' , null, 'in_and_out')
,('Merry', 5, 'in_and_out')
,('Sam' , 5, 'in_and_out')
,('Pip' , 9, 'in_and_out')
)
select
person
,case when arrival_time is not null then rank() over ( order by arrival_time) end as arrival_rank_without_partition
,case when arrival_time is not null then rank() over (partition by arrival_time is not null order by arrival_time) end as arrival_rank_with_partition
,case when arrival_time is not null then percent_rank() over ( order by arrival_time) end as arrival_pctrank_without_partition
,case when arrival_time is not null then percent_rank() over (partition by arrival_time is not null order by arrival_time) end as arrival_pctrank_with_partition
from dinner_show_up
This query gives the same results for arrival_rank_with/without_partition. However, the results for percent_rank() do differ: without_partition is wrong, ranging from 0% to 71.4%, whereas with_partition correctly gives pctrank() ranging from 0% to 100%.
This same pattern applies to the ntile() aggregate function, as well.
It works by separating all null values from non-null values for purposes of the ranking. This ensures that Jane and Bob are excluded from the percentile ranking of 0% to 100%.
|person|arrival_rank_without_partition|arrival_rank_with_partition|arrival_pctrank_without_partition|arrival_pctrank_with_partition|
+------+------------------------------+---------------------------+---------------------------------+------------------------------+
|Jane |null |null |null |null |
|Bob |null |null |null |null |
|Mike |1 |1 |0 |0 |
|Peter |2 |2 |0.14 |0.2 |
|Sam |3 |3 |0.28 |0.4 |
|Merry |4 |4 |0.28 |0.4 |
|Dave |5 |5 |0.57 |0.8 |
|Pip |6 |6 |0.71 |1.0 |
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With