Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rename duplicate rows

Here's a simplified example of my problem. I have a table where there's a "Name" column with duplicate entries:

ID    Name
---   ----
 1    AAA
 2    AAA
 3    AAA
 4    BBB
 5    CCC
 6    CCC
 7    DDD
 8    DDD
 9    DDD
10    DDD

Doing a GROUP BY like SELECT Name, COUNT(*) AS [Count] FROM Table GROUP BY Name results in this:

Name  Count
----  -----
AAA   3
BBB   1
CCC   2
DDD   4

I'm only concerned about the duplicates, so I'll add a HAVING clause, SELECT Name, COUNT(*) AS [Count] FROM Table GROUP BY Name HAVING COUNT(*) > 1:

Name  Count
----  -----
AAA   3
CCC   2
DDD   4

Trivial so far, but now things get tricky: I need a query to get me all the duplicate records, but with a nice incrementing indicator added to the Name column. The result should look something like this:

ID    Name
---   --------
 1    AAA
 2    AAA (2)
 3    AAA (3)
 5    CCC 
 6    CCC (2)
 7    DDD 
 8    DDD (2)
 9    DDD (3)
10    DDD (4)

Note row 4 with "BBB" is excluded, and the first duplicate keeps the original Name.

Using an EXISTS statement gives me all the records I need, but how do I go about creating the new Name value?

SELECT * FROM Table AS T1 
WHERE EXISTS (
    SELECT Name, COUNT(*) AS [Count] 
    FROM Table 
    GROUP BY Name 
    HAVING (COUNT(*) > 1) AND (Name = T1.Name))
ORDER BY Name

I need to create an UPDATE statement that will fix all the duplicates, i.e. change the Name as per this pattern.

Update: Figured it out now. It was the PARTITION BY clause I was missing.

like image 685
Jakob Gade Avatar asked Mar 03 '11 03:03

Jakob Gade


People also ask

How do I find duplicate rows in a table?

One way to find duplicate records from the table is the GROUP BY statement. The GROUP BY statement in SQL is used to arrange identical data into groups with the help of some functions. i.e if a particular column has the same values in different rows then it will arrange these rows in a group.

Which keyword is used to remove duplicate rows?

The DISTINCT keyword eliminates duplicate rows from a result.


2 Answers

With Dups As
    (
    Select Id, Name
        , Row_Number() Over ( Partition By Name Order By Id ) As Rnk
    From Table
    )
Select D.Id
    , D.Name + Case
                When D.Rnk > 1 Then ' (' + Cast(D.Rnk As varchar(10)) + ')'
                Else ''
                End As Name
From Dups As D

If you want an update statement you can use pretty much the same structure:

With Dups As
    (
    Select Id, Name
        , Row_Number() Over ( Partition By Name Order By Id ) As Rnk
    From Table
    )
Update Table
Set Name = T.Name + Case
                    When D.Rnk > 1 Then ' (' + Cast(D.Rnk As varchar(10)) + ')'
                    Else ''
                    End
From Table As T
    Join Dups As D
        On D.Id = T.Id
like image 191
Thomas Avatar answered Sep 28 '22 02:09

Thomas


Just update the subquery directly:

update d
set Name = Name+'('+cast(r as varchar(10))+')'
from    (   select  Name, 
                    row_number() over (partition by Name order by Name) as r
            from    [table]
        ) d
where r > 1
like image 44
nathan_jr Avatar answered Sep 28 '22 01:09

nathan_jr