Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I delete database duplicates based on multiple columns?

I asked this question a while back to delete duplicate records based on a column. The answer worked great:

delete from tbl
where id NOT in
(
select  min(id)
from tbl
group by sourceid
)

I now have a simillar situation but the definition of duplicate record is based on multiple columns. How can I alter this above SQL to identify duplicate records where a unique record is define as concatenated from Col1 + Col2 + Col3. Would i just do something like this ?

delete from tbl
where id NOT in
(
select  min(id)
from tbl
group by col1, col2, col3
)
like image 212
leora Avatar asked Jul 23 '12 14:07

leora


1 Answers

This shows the rows you want to keep:

;WITH x AS 
(
  SELECT col1, col2, col3, rn = ROW_NUMBER() OVER 
      (PARTITION BY col1, col2, col3 ORDER BY id)
  FROM dbo.tbl
)
SELECT col1, col2, col3 FROM x WHERE rn = 1;

This shows the rows you want to delete:

;WITH x AS 
(
  SELECT col1, col2, col3, rn = ROW_NUMBER() OVER 
      (PARTITION BY col1, col2, col3 ORDER BY id)
  FROM dbo.tbl
)
SELECT col1, col2, col3 FROM x WHERE rn > 1;

And once you're happy that the above two sets are correct, the following will actually delete them:

;WITH x AS 
(
  SELECT col1, col2, col3, rn = ROW_NUMBER() OVER 
      (PARTITION BY col1, col2, col3 ORDER BY id)
  FROM dbo.tbl
)
DELETE x WHERE rn > 1;

Note that in all three queries, the first 6 lines are identical, and only the subsequent query after the CTE has changed.

like image 101
Aaron Bertrand Avatar answered Sep 28 '22 19:09

Aaron Bertrand