How to find duplicate values in SQL Server

Tags:

I'm using SQL Server 2008. I have a table

Customers

customer_number int

field1 varchar

field2 varchar

field3 varchar

field4 varchar

... and a lot more columns, that don't matter for my queries.

Column customer_number is pk. I'm trying to find duplicate values and some differences between them.

Please, help me to find all rows that have same

1) field1, field2, field3, field4

2) only 3 columns are equal and one of them isn't (except rows from list 1)

3) only 2 columns equal and two of them aren't (except rows from list 1 and list 2)

In the end, I'll have 3 tables with this results and additional groupId, which will be same for a group of similar (For example, for 3 column equals, rows that have 3 same columns equal will be a separate group)

Thank you.

728

asked May 20 '10 09:05

hgulyan

2 Answers

Here's a handy query for finding duplicates in a table. Suppose you want to find all email addresses in a table that exist more than once:

Click to copy

SELECT email, COUNT(email) AS NumOccurrences
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )

You could also use this technique to find rows that occur exactly once:

Click to copy

SELECT email
FROM users
GROUP BY email
HAVING ( COUNT(email) = 1 )

answered Sep 16 '22 12:09

Balaji Birajdar

The easiest would probably be to write a stored procedure to iterate over each group of customers with duplicates and insert the matching ones per group number respectively.

However, I've thought about it and you can probably do this with a subquery. Hopefully I haven't made it more complicated than it ought to, but this should get you what you're looking for for the first table of duplicates (all four fields). Note that this is untested, so it might need a little tweaking.

Basically, it gets each group of fields where there are duplicates, a group number for each, then gets all customers with those fields and assigns the same group number.

Click to copy

INSERT INTO FourFieldsDuplicates(group_no, customer_no)
SELECT Groups.group_no, custs.customer_no
FROM (SELECT ROW_NUMBER() OVER(ORDER BY c.field1) AS group_no,
             c.field1, c.field2, c.field3, c.field4
      FROM Customers c
      GROUP BY c.field1, c.field2, c.field3, c.field4
      HAVING COUNT(*) > 1) Groups
INNER JOIN Customers custs ON custs.field1 = Groups.field1
                           AND custs.field2 = Groups.field2
                           AND custs.field3 = Groups.field3
                           AND custs.field4 = Groups.field4

The other ones are a bit more complicated, however as you'll need to expand out the possibilities. The three-field groups would then be:

Click to copy

INSERT INTO ThreeFieldsDuplicates(group_no, customer_no)
SELECT Groups.group_no, custs.customer_no
FROM (SELECT ROW_NUMBER() OVER(ORDER BY GroupsInner.field1) AS group_no,
             GroupsInner.field1, GroupsInner.field2, 
             GroupsInner.field3, GroupsInner.field4
      FROM (SELECT c.field1, c.field2, c.field3, NULL AS field4
            FROM Customers c
            WHERE NOT EXISTS(SELECT d.customer_no
                       FROM FourFieldsDuplicates d
                       WHERE d.customer_no = c.customer_no)
            GROUP BY c.field1, c.field2, c.field3
            UNION ALL
            SELECT c.field1, c.field2, NULL AS field3, c.field4
            FROM Customers c
            WHERE NOT EXISTS(SELECT d.customer_no
                             FROM FourFieldsDuplicates d
                             WHERE d.customer_no = c.customer_no)
            GROUP BY c.field1, c.field2, c.field4
            UNION ALL
            SELECT c.field1, NULL AS field2, c.field3, c.field4
            FROM Customers c
            WHERE NOT EXISTS(SELECT d.customer_no
                             FROM FourFieldsDuplicates d
                             WHERE d.customer_no = c.customer_no)
            GROUP BY c.field1, c.field3, c.field4
            UNION ALL
            SELECT NULL AS field1, c.field2, c.field3, c.field4
            FROM Customers c
            WHERE NOT EXISTS(SELECT d.customer_no
                             FROM FourFieldsDuplicates d
                             WHERE d.customer_no = c.customer_no)
            GROUP BY c.field2, c.field3, c.field4) GroupsInner
      GROUP BY GroupsInner.field1, GroupsInner.field2, 
               GroupsInner.field3, GroupsInner.field4
      HAVING COUNT(*) > 1) Groups
INNER JOIN Customers custs ON (Groups.field1 IS NULL OR custs.field1 = Groups.field1)
                           AND (Groups.field2 IS NULL OR custs.field2 = Groups.field2)
                           AND (Groups.field3 IS NULL OR custs.field3 = Groups.field3)
                           AND (Groups.field4 IS NULL OR custs.field4 = Groups.field4)

Hopefully this produces the right results and I'll leave the last one as an exercise. :-D

answered Sep 16 '22 12:09

lc.

Related questions
                            
                                Data in a table with carriage return?
                            
                                Very large tables in SQL Server
                            
                                SQL Server: compare columns in two tables
                            
                                SQL Server 2008 Stored Procedure with multiple output parameters
                            
                                how to check if stored procedure exists or not in sql server using c# code
                            
                                How to create instance in SQL Server 2008
                            
                                Using Dapper.net to call stored procedure, always return -1 back
                            
                                MS SQL Server Last Inserted ID
                            
                                Convert decimal number to INT SQL
                            
                                Can't create schema inside begin block
                            
                                Why is ROW_NUMBER() not recognized in SQL Server 2008?
                            
                                Sql Server - Get view creation statement for existing view
                            
                                How to get last insert/update/delete datetime on Sql Server 2005?
                            
                                SQL: How to select a max value for each group per day?
                            
                                SSIS Error: VS_NEEDSNEWMETADATA
                            
                                Get Time from Getdate()
                            
                                count the number of spaces in values in sql server [duplicate]
                            
                                Calculate the last day of the prior quarter
                            
                                Format a number with commas but without decimals in SQL Server 2008 R2?
                            
                                VBScript/ASP Classic

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to find duplicate values in SQL Server

Tags:

sql-server

duplicates

sql-server-2008

hgulyan

People also ask

2 Answers

Balaji Birajdar

lc.

Recent Activity

Donate For Us