Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do to the opposite of "row_number() over (partition by [Col] order by [Col])"

I am trying to combine duplicate entries in a data table and give them a new number.

Here is a example data set (runnable copy)

declare @tmpTable table
    (ID Varchar(1), 
     First varchar(4), 
     Last varchar(5), 
     Phone varchar(13),
     NonKeyField varchar(4))

insert into @tmpTable select 'A', 'John', 'Smith', '(555)555-1234', 'ASDF'
insert into @tmpTable select 'B', 'John', 'Smith', '(555)555-1234', 'GHJK'
insert into @tmpTable select 'C', 'Jane', 'Smith', '(555)555-1234', 'QWER'
insert into @tmpTable select 'D', 'John', 'Smith', '(555)555-1234', 'RTYU'
insert into @tmpTable select 'E', 'Bill', 'Blake', '(555)555-0000', 'BVNM'
insert into @tmpTable select 'F', 'Bill', 'Blake', '(555)555-0000', '%^&*'
insert into @tmpTable select 'G', 'John', 'Smith', '(555)555-1234', '!#RF'

select row_number() over (partition by First, Last, Phone order by ID) NewIDNum, *  
from @tmpTable order by ID

Right now it gives me the results

NewIDNum             ID   First Last  Phone         NonKeyField
-------------------- ---- ----- ----- ------------- -----------
1                    A    John  Smith (555)555-1234 ASDF
2                    B    John  Smith (555)555-1234 GHJK
1                    C    Jane  Smith (555)555-1234 QWER
3                    D    John  Smith (555)555-1234 RTYU
1                    E    Bill  Blake (555)555-0000 BVNM
2                    F    Bill  Blake (555)555-0000 %^&*
4                    G    John  Smith (555)555-1234 !#RF

However that is the opposite of what I want, the NewIDNum resets its counter ever time it finds a new combination of the key. I want all of the same combination to have the same ID. So if it was behaving the way I wanted I would get the following results

NewIDNum             ID   First Last  Phone         NonKeyField
-------------------- ---- ----- ----- ------------- -----------
1                    A    John  Smith (555)555-1234 ASDF
1                    B    John  Smith (555)555-1234 GHJK
2                    C    Jane  Smith (555)555-1234 QWER
1                    D    John  Smith (555)555-1234 RTYU
3                    E    Bill  Blake (555)555-0000 BVNM
3                    F    Bill  Blake (555)555-0000 %^&*
1                    G    John  Smith (555)555-1234 !#RF

What is the correct way to get the results I want?


I did not include this requirement in the original post: I need the NewIDNum to produce the same numbers on subsequent runs of this query for the existing rows if more rows get added (assuming all new rows will have a higher ID "value" if a order by is done on the ID column)

So if at a latter date the following was done

insert into @tmpTable select 'H', 'John', 'Smith', '(555)555-1234', '4321'
insert into @tmpTable select 'I', 'Jake', 'Jons', '(555)555-1234', '1234'
insert into @tmpTable select 'J', 'John', 'Smith', '(555)555-1234', '2345'

running the correct query again would give

NewIDNum             ID   First Last  Phone         NonKeyField
-------------------- ---- ----- ----- ------------- -----------
1                    A    John  Smith (555)555-1234 ASDF
1                    B    John  Smith (555)555-1234 GHJK
2                    C    Jane  Smith (555)555-1234 QWER
1                    D    John  Smith (555)555-1234 RTYU
3                    E    Bill  Blake (555)555-0000 BVNM
3                    F    Bill  Blake (555)555-0000 %^&*
1                    G    John  Smith (555)555-1234 !#RF
1                    H    John  Smith (555)555-1234 4321
4                    I    Jake  Jons  (555)555-1234 1234
1                    J    John  Smith (555)555-1234 2345
like image 630
Scott Chamberlain Avatar asked Oct 02 '12 18:10

Scott Chamberlain


2 Answers

You could use dense_rank():

dense_rank() over (order by First, Last, Phone) as NewIDNum

In response to your comment, you could sort on the minimum of the old Id column per group of rows with the same (First, Last, Phone) combination:

select  *
from    (
        select  dense_rank() over (order by min_id) as new_id
        ,       *
        from    (
                select  min(id) over (
                            partition by First, Last, Phone) as min_id
                ,       *
                from    @tmpTable 
                ) as sub1
        ) as sub3
order by
        new_id
like image 162
Andomar Avatar answered Oct 12 '22 09:10

Andomar


Building on @Andomar's original answer -- this will work on your updated requirements (though this won't likely scale nicely)

select
    DENSE_RANK() over (ORDER BY IdRank, First, Last, Phone) AS NewIDNum,
    ID,
    First,
    Last,
    Phone,
    NonKeyField
from
(
    select
        MIN(ID) OVER (PARTITION BY First, Last, Phone) as IdRank,
        *
    from
        @tmpTable
) as x
order by
    ID;
like image 30
etliens Avatar answered Oct 12 '22 09:10

etliens