Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How get the T-SQL code to find duplicates?

MS Access has a button to generate sql code for finding duplicated rows. I don't know if SQL Server 2005/2008 Managment Studio has this.

  1. If it has, please point where

  2. If it has not, please tell me how can I have a T-SQL helper for creating code like this.

like image 632
Jader Dias Avatar asked Aug 03 '09 14:08

Jader Dias


People also ask

How do I select duplicate records in SQL?

To select duplicate values, you need to create groups of rows with the same values and then select the groups with counts greater than one. You can achieve that by using GROUP BY and a HAVING clause.

How do I filter duplicates in SQL?

The go to solution for removing duplicate rows from your result sets is to include the distinct keyword in your select statement. It tells the query engine to remove duplicates to produce a result set in which every row is unique.

How do I find duplicate names in a table in SQL?

To find the duplicate Names in the table, we have to follow these steps: Defining the criteria: At first, you need to define the criteria for finding the duplicate Names. You might want to search in a single column or more than that. Write the query: Then simply write the query to find the duplicate Names.


4 Answers

Well, if you have entire rows as duplicates in your table, you've at least not got a primary key set up for that table, otherwise at least the primary key value would be different.

However, here's how to build a SQL to get duplicates over a set of columns:

SELECT col1, col2, col3, col4 FROM table GROUP BY col1, col2, col3, col4 HAVING COUNT(*) > 1 

This will find rows which, for columns col1-col4, has the same combination of values, more than once.

For instance, in the following table, rows 2+3 would be duplicates:

PK    col1    col2    col3    col4    col5 1       1       2       3       4      6 2       1       3       4       7      7 3       1       3       4       7      10 4       2       3       1       4      5 

The two rows share common values in columns col1-col4, and thus, by that SQL, is considered duplicates. Expand the list of columns to contain all the columns you wish to analyze this for.

like image 109
Lasse V. Karlsen Avatar answered Oct 18 '22 12:10

Lasse V. Karlsen


If you're using SQL Server 2005+, you can use the following code to see all the rows along with other columns:

SELECT *, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, col4 ORDER BY (SELECT 0)) AS DuplicateRowNumber FROM table 

Youd can also delete (or otherwise work with) duplicates using this technique:

WITH cte AS (SELECT *, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, col4 ORDER BY (SELECT 0)) AS DuplicateRowNumber     FROM table ) DELETE FROM cte WHERE DuplicateRowNumber > 1 

ROW_NUMBER is extremely powerful - there is much you can do with it - see the BOL article on it at http://msdn.microsoft.com/en-us/library/ms186734.aspx

like image 44
Mike DeFehr Avatar answered Oct 18 '22 14:10

Mike DeFehr


I found this solution when I need to dump entire rows with one or more duplicate fields but I don't want to type every field name in the table:

SELECT * FROM db WHERE col IN
    (SELECT col FROM db GROUP BY col HAVING COUNT(*) > 1)
    ORDER BY col
like image 38
Ferruccio Avatar answered Oct 18 '22 12:10

Ferruccio


AFAIK, it doesn't. Just make a select statement grouping by all the fields of a table, and filtering using a having clause where the count is greater than 1.

If your rows are duplicated except by the key, then don't include the key in the select fields.

like image 29
eKek0 Avatar answered Oct 18 '22 12:10

eKek0