Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to delete duplicate rows in SQL Server?

How can I delete duplicate rows where no unique row id exists?

My table is

col1  col2 col3 col4 col5 col6 col7 john  1    1    1    1    1    1  john  1    1    1    1    1    1 sally 2    2    2    2    2    2 sally 2    2    2    2    2    2 

I want to be left with the following after the duplicate removal:

john  1    1    1    1    1    1 sally 2    2    2    2    2    2 

I've tried a few queries but I think they depend on having a row id as I don't get the desired result. For example:

DELETE FROM table WHERE col1 IN (     SELECT id     FROM table     GROUP BY id     HAVING (COUNT(col1) > 1) ) 
like image 334
Fearghal Avatar asked Aug 22 '13 20:08

Fearghal


People also ask

How can I delete duplicate rows?

Select the range you want to remove duplicate rows. If you want to delete all duplicate rows in the worksheet, just hold down Ctrl + A key to select the entire sheet. 2. On Data tab, click Remove Duplicates in the Data Tools group.

How can I delete duplicate rows and keep one in SQL Server?

One way to delete the duplicate rows but retaining the latest ones is by using MAX() function and GROUP BY clause.

How do I delete duplicate rows in SQL based on one column?

Introduction to SQL DISTINCT operator Note that the DISTINCT only removes the duplicate rows from the result set. It doesn't delete duplicate rows in the table. If you want to select two columns and remove duplicates in one column, you should use the GROUP BY clause instead.


1 Answers

I like CTEs and ROW_NUMBER as the two combined allow us to see which rows are deleted (or updated), therefore just change the DELETE FROM CTE... to SELECT * FROM CTE:

WITH CTE AS(    SELECT [col1], [col2], [col3], [col4], [col5], [col6], [col7],        RN = ROW_NUMBER()OVER(PARTITION BY col1 ORDER BY col1)    FROM dbo.Table1 ) DELETE FROM CTE WHERE RN > 1 

DEMO (result is different; I assume that it's due to a typo on your part)

COL1    COL2    COL3    COL4    COL5    COL6    COL7 john    1        1       1       1       1       1 sally   2        2       2       2       2       2 

This example determines duplicates by a single column col1 because of the PARTITION BY col1. If you want to include multiple columns simply add them to the PARTITION BY:

ROW_NUMBER()OVER(PARTITION BY Col1, Col2, ... ORDER BY OrderColumn) 
like image 137
Tim Schmelter Avatar answered Sep 21 '22 13:09

Tim Schmelter