Can I delete database duplicates based on multiple columns?

Question

I asked this question a while back to delete duplicate records based on a column. The answer worked great:

delete from tbl
where id NOT in
(
select  min(id)
from tbl
group by sourceid
)

I now have a simillar situation but the definition of duplicate record is based on multiple columns. How can I alter this above SQL to identify duplicate records where a unique record is define as concatenated from Col1 + Col2 + Col3. Would i just do something like this ?

delete from tbl
where id NOT in
(
select  min(id)
from tbl
group by col1, col2, col3
)

Aaron Bertrand · Accepted Answer

This shows the rows you want to keep:

;WITH x AS 
(
  SELECT col1, col2, col3, rn = ROW_NUMBER() OVER 
      (PARTITION BY col1, col2, col3 ORDER BY id)
  FROM dbo.tbl
)
SELECT col1, col2, col3 FROM x WHERE rn = 1;

This shows the rows you want to delete:

;WITH x AS 
(
  SELECT col1, col2, col3, rn = ROW_NUMBER() OVER 
      (PARTITION BY col1, col2, col3 ORDER BY id)
  FROM dbo.tbl
)
SELECT col1, col2, col3 FROM x WHERE rn > 1;

And once you're happy that the above two sets are correct, the following will actually delete them:

;WITH x AS 
(
  SELECT col1, col2, col3, rn = ROW_NUMBER() OVER 
      (PARTITION BY col1, col2, col3 ORDER BY id)
  FROM dbo.tbl
)
DELETE x WHERE rn > 1;

Note that in all three queries, the first 6 lines are identical, and only the subsequent query after the CTE has changed.

Can I delete database duplicates based on multiple columns?

Tags:

database

sql-server

leora

1 Answers

Aaron Bertrand

Recent Activity

Donate For Us

Can I delete database duplicates based on multiple columns?

Tags:

database

sql-server

leora

1 Answers

Aaron Bertrand

Related questions

Recent Activity

Donate For Us