What is the best way to remove duplicate rows from a fairly large SQL Server
table (i.e. 300,000+ rows)?
The rows, of course, will not be perfect duplicates because of the existence of the RowID
identity field.
MyTable
RowID int not null identity(1,1) primary key, Col1 varchar(20) not null, Col2 varchar(2048) not null, Col3 tinyint not null
Select the range of cells that has duplicate values you want to remove. Tip: Remove any outlines or subtotals from your data before trying to remove duplicates. Click Data > Remove Duplicates, and then Under Columns, check or uncheck the columns where you want to remove the duplicates.
To delete the duplicate rows from the table in SQL Server, you follow these steps: Find duplicate rows using GROUP BY clause or ROW_NUMBER() function. Use DELETE statement to remove the duplicate rows.
Remove Duplicate Rows in Excel Select the entire data. Go to Data –> Data Tools –> Remove Duplicates. In the Remove Duplicates dialog box: If your data has headers, make sure the 'My data has headers' option is checked.
Assuming no nulls, you GROUP BY
the unique columns, and SELECT
the MIN (or MAX)
RowId as the row to keep. Then, just delete everything that didn't have a row id:
DELETE FROM MyTable LEFT OUTER JOIN ( SELECT MIN(RowId) as RowId, Col1, Col2, Col3 FROM MyTable GROUP BY Col1, Col2, Col3 ) as KeepRows ON MyTable.RowId = KeepRows.RowId WHERE KeepRows.RowId IS NULL
In case you have a GUID instead of an integer, you can replace
MIN(RowId)
with
CONVERT(uniqueidentifier, MIN(CONVERT(char(36), MyGuidColumn)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With