How can I delete duplicate rows where no unique row id
exists?
My table is
col1 col2 col3 col4 col5 col6 col7 john 1 1 1 1 1 1 john 1 1 1 1 1 1 sally 2 2 2 2 2 2 sally 2 2 2 2 2 2
I want to be left with the following after the duplicate removal:
john 1 1 1 1 1 1 sally 2 2 2 2 2 2
I've tried a few queries but I think they depend on having a row id as I don't get the desired result. For example:
DELETE FROM table WHERE col1 IN ( SELECT id FROM table GROUP BY id HAVING (COUNT(col1) > 1) )
Select the range you want to remove duplicate rows. If you want to delete all duplicate rows in the worksheet, just hold down Ctrl + A key to select the entire sheet. 2. On Data tab, click Remove Duplicates in the Data Tools group.
One way to delete the duplicate rows but retaining the latest ones is by using MAX() function and GROUP BY clause.
Introduction to SQL DISTINCT operator Note that the DISTINCT only removes the duplicate rows from the result set. It doesn't delete duplicate rows in the table. If you want to select two columns and remove duplicates in one column, you should use the GROUP BY clause instead.
I like CTEs and ROW_NUMBER
as the two combined allow us to see which rows are deleted (or updated), therefore just change the DELETE FROM CTE...
to SELECT * FROM CTE
:
WITH CTE AS( SELECT [col1], [col2], [col3], [col4], [col5], [col6], [col7], RN = ROW_NUMBER()OVER(PARTITION BY col1 ORDER BY col1) FROM dbo.Table1 ) DELETE FROM CTE WHERE RN > 1
DEMO (result is different; I assume that it's due to a typo on your part)
COL1 COL2 COL3 COL4 COL5 COL6 COL7 john 1 1 1 1 1 1 sally 2 2 2 2 2 2
This example determines duplicates by a single column col1
because of the PARTITION BY col1
. If you want to include multiple columns simply add them to the PARTITION BY
:
ROW_NUMBER()OVER(PARTITION BY Col1, Col2, ... ORDER BY OrderColumn)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With