Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Customised ordering in Hive's row_number() over partition by order by window function

I have a table with an identifier column id and another column with string values column_b for which I would like to do customer ordering on column_b. Let's say that column_b consists of values A, B, C, D.

The syntax that can be used in Hive for row_number() over() is:

SELECT id, column_b, row_number() over(partition by id order by column_b) as row_id
FROM   some_table

Example see here for an example

However, I would like to do custom ordering on column_b instead of alphabetic ordering. The above syntax would yield something like:

enter image description here

Instead, I would like to explicitly order by column_b using the order A, C, D, B, that is:

hive

How can I achieve this?

like image 916
Anonymous Avatar asked Nov 21 '25 05:11

Anonymous


1 Answers

Explicitly specify the order using case statement. You can use other scalar functions in the order by:

SELECT id, column_b, 
       row_number() over(partition by id order by case column_b
                                                       when 'A' then '1'
                                                       when 'C' then '2'
                                                       when 'D' then '3'
                                                       when 'B' then '4'
                                                       --add more cases
                                                       --for example other values sort
                                                       --in natural order  
                                                       else column_b 
                                                       --or use constant
                                                       --to make sure 
                                                       --everything else is greater than 4 
                                                       --like this  else concat('5',column_b)
                                                  end 
                         ) as row_id
  FROM some_table

Also you can calculate order column in the subquery and use it in the window, it will work the same:

SELECT id, column_b, 
       row_number() over(partition by id order by orderby) as row_id
  FROM (select t.*, 
               case column_b
                    when 'A' then '1'
                    when 'C' then '2'
                    when 'D' then '3'
                    when 'B' then '4'  
                    else concat('5',column_b) 
               end orderby
           from some_table t
        ) s
like image 118
leftjoin Avatar answered Nov 23 '25 20:11

leftjoin



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!