Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I remove duplicate rows?

What is the best way to remove duplicate rows from a fairly large SQL Server table (i.e. 300,000+ rows)?

The rows, of course, will not be perfect duplicates because of the existence of the RowID identity field.

MyTable

RowID int not null identity(1,1) primary key, Col1 varchar(20) not null, Col2 varchar(2048) not null, Col3 tinyint not null 
like image 268
Seibar Avatar asked Aug 20 '08 21:08

Seibar


People also ask

How do I eliminate the duplicate rows?

Select the range of cells that has duplicate values you want to remove. Tip: Remove any outlines or subtotals from your data before trying to remove duplicates. Click Data > Remove Duplicates, and then Under Columns, check or uncheck the columns where you want to remove the duplicates.

How do I find and delete duplicate rows?

To delete the duplicate rows from the table in SQL Server, you follow these steps: Find duplicate rows using GROUP BY clause or ROW_NUMBER() function. Use DELETE statement to remove the duplicate rows.

How do I remove duplicate rows from entire row?

Remove Duplicate Rows in Excel Select the entire data. Go to Data –> Data Tools –> Remove Duplicates. In the Remove Duplicates dialog box: If your data has headers, make sure the 'My data has headers' option is checked.


1 Answers

Assuming no nulls, you GROUP BY the unique columns, and SELECT the MIN (or MAX) RowId as the row to keep. Then, just delete everything that didn't have a row id:

DELETE FROM MyTable LEFT OUTER JOIN (    SELECT MIN(RowId) as RowId, Col1, Col2, Col3     FROM MyTable     GROUP BY Col1, Col2, Col3 ) as KeepRows ON    MyTable.RowId = KeepRows.RowId WHERE    KeepRows.RowId IS NULL 

In case you have a GUID instead of an integer, you can replace

MIN(RowId) 

with

CONVERT(uniqueidentifier, MIN(CONVERT(char(36), MyGuidColumn))) 
like image 75
Mark Brackett Avatar answered Oct 06 '22 09:10

Mark Brackett