Here's a simplified example of my problem. I have a table where there's a "Name" column with duplicate entries:
ID Name
--- ----
1 AAA
2 AAA
3 AAA
4 BBB
5 CCC
6 CCC
7 DDD
8 DDD
9 DDD
10 DDD
Doing a GROUP BY like SELECT Name, COUNT(*) AS [Count] FROM Table GROUP BY Name
results in this:
Name Count
---- -----
AAA 3
BBB 1
CCC 2
DDD 4
I'm only concerned about the duplicates, so I'll add a HAVING clause, SELECT Name, COUNT(*) AS [Count] FROM Table GROUP BY Name HAVING COUNT(*) > 1
:
Name Count
---- -----
AAA 3
CCC 2
DDD 4
Trivial so far, but now things get tricky: I need a query to get me all the duplicate records, but with a nice incrementing indicator added to the Name column. The result should look something like this:
ID Name
--- --------
1 AAA
2 AAA (2)
3 AAA (3)
5 CCC
6 CCC (2)
7 DDD
8 DDD (2)
9 DDD (3)
10 DDD (4)
Note row 4 with "BBB" is excluded, and the first duplicate keeps the original Name.
Using an EXISTS
statement gives me all the records I need, but how do I go about creating the new Name value?
SELECT * FROM Table AS T1
WHERE EXISTS (
SELECT Name, COUNT(*) AS [Count]
FROM Table
GROUP BY Name
HAVING (COUNT(*) > 1) AND (Name = T1.Name))
ORDER BY Name
I need to create an UPDATE statement that will fix all the duplicates, i.e. change the Name as per this pattern.
Update: Figured it out now. It was the PARTITION BY clause I was missing.
One way to find duplicate records from the table is the GROUP BY statement. The GROUP BY statement in SQL is used to arrange identical data into groups with the help of some functions. i.e if a particular column has the same values in different rows then it will arrange these rows in a group.
The DISTINCT keyword eliminates duplicate rows from a result.
With Dups As
(
Select Id, Name
, Row_Number() Over ( Partition By Name Order By Id ) As Rnk
From Table
)
Select D.Id
, D.Name + Case
When D.Rnk > 1 Then ' (' + Cast(D.Rnk As varchar(10)) + ')'
Else ''
End As Name
From Dups As D
If you want an update statement you can use pretty much the same structure:
With Dups As
(
Select Id, Name
, Row_Number() Over ( Partition By Name Order By Id ) As Rnk
From Table
)
Update Table
Set Name = T.Name + Case
When D.Rnk > 1 Then ' (' + Cast(D.Rnk As varchar(10)) + ')'
Else ''
End
From Table As T
Join Dups As D
On D.Id = T.Id
Just update the subquery directly:
update d
set Name = Name+'('+cast(r as varchar(10))+')'
from ( select Name,
row_number() over (partition by Name order by Name) as r
from [table]
) d
where r > 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With