I am trying to combine duplicate entries in a data table and give them a new number.
Here is a example data set (runnable copy)
declare @tmpTable table
(ID Varchar(1),
First varchar(4),
Last varchar(5),
Phone varchar(13),
NonKeyField varchar(4))
insert into @tmpTable select 'A', 'John', 'Smith', '(555)555-1234', 'ASDF'
insert into @tmpTable select 'B', 'John', 'Smith', '(555)555-1234', 'GHJK'
insert into @tmpTable select 'C', 'Jane', 'Smith', '(555)555-1234', 'QWER'
insert into @tmpTable select 'D', 'John', 'Smith', '(555)555-1234', 'RTYU'
insert into @tmpTable select 'E', 'Bill', 'Blake', '(555)555-0000', 'BVNM'
insert into @tmpTable select 'F', 'Bill', 'Blake', '(555)555-0000', '%^&*'
insert into @tmpTable select 'G', 'John', 'Smith', '(555)555-1234', '!#RF'
select row_number() over (partition by First, Last, Phone order by ID) NewIDNum, *
from @tmpTable order by ID
Right now it gives me the results
NewIDNum ID First Last Phone NonKeyField
-------------------- ---- ----- ----- ------------- -----------
1 A John Smith (555)555-1234 ASDF
2 B John Smith (555)555-1234 GHJK
1 C Jane Smith (555)555-1234 QWER
3 D John Smith (555)555-1234 RTYU
1 E Bill Blake (555)555-0000 BVNM
2 F Bill Blake (555)555-0000 %^&*
4 G John Smith (555)555-1234 !#RF
However that is the opposite of what I want, the NewIDNum
resets its counter ever time it finds a new combination of the key. I want all of the same combination to have the same ID. So if it was behaving the way I wanted I would get the following results
NewIDNum ID First Last Phone NonKeyField
-------------------- ---- ----- ----- ------------- -----------
1 A John Smith (555)555-1234 ASDF
1 B John Smith (555)555-1234 GHJK
2 C Jane Smith (555)555-1234 QWER
1 D John Smith (555)555-1234 RTYU
3 E Bill Blake (555)555-0000 BVNM
3 F Bill Blake (555)555-0000 %^&*
1 G John Smith (555)555-1234 !#RF
What is the correct way to get the results I want?
I did not include this requirement in the original post: I need the NewIDNum
to produce the same numbers on subsequent runs of this query for the existing rows if more rows get added (assuming all new rows will have a higher ID "value" if a order by is done on the ID column)
So if at a latter date the following was done
insert into @tmpTable select 'H', 'John', 'Smith', '(555)555-1234', '4321'
insert into @tmpTable select 'I', 'Jake', 'Jons', '(555)555-1234', '1234'
insert into @tmpTable select 'J', 'John', 'Smith', '(555)555-1234', '2345'
running the correct query again would give
NewIDNum ID First Last Phone NonKeyField
-------------------- ---- ----- ----- ------------- -----------
1 A John Smith (555)555-1234 ASDF
1 B John Smith (555)555-1234 GHJK
2 C Jane Smith (555)555-1234 QWER
1 D John Smith (555)555-1234 RTYU
3 E Bill Blake (555)555-0000 BVNM
3 F Bill Blake (555)555-0000 %^&*
1 G John Smith (555)555-1234 !#RF
1 H John Smith (555)555-1234 4321
4 I Jake Jons (555)555-1234 1234
1 J John Smith (555)555-1234 2345
You could use dense_rank()
:
dense_rank() over (order by First, Last, Phone) as NewIDNum
In response to your comment, you could sort on the minimum of the old Id
column per group of rows with the same (First, Last, Phone)
combination:
select *
from (
select dense_rank() over (order by min_id) as new_id
, *
from (
select min(id) over (
partition by First, Last, Phone) as min_id
, *
from @tmpTable
) as sub1
) as sub3
order by
new_id
Building on @Andomar's original answer -- this will work on your updated requirements (though this won't likely scale nicely)
select
DENSE_RANK() over (ORDER BY IdRank, First, Last, Phone) AS NewIDNum,
ID,
First,
Last,
Phone,
NonKeyField
from
(
select
MIN(ID) OVER (PARTITION BY First, Last, Phone) as IdRank,
*
from
@tmpTable
) as x
order by
ID;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With