I asked this question a while back to delete duplicate records based on a column. The answer worked great:
delete from tbl
where id NOT in
(
select min(id)
from tbl
group by sourceid
)
I now have a simillar situation but the definition of duplicate record is based on multiple columns. How can I alter this above SQL to identify duplicate records where a unique record is define as concatenated from Col1 + Col2 + Col3. Would i just do something like this ?
delete from tbl
where id NOT in
(
select min(id)
from tbl
group by col1, col2, col3
)
This shows the rows you want to keep:
;WITH x AS
(
SELECT col1, col2, col3, rn = ROW_NUMBER() OVER
(PARTITION BY col1, col2, col3 ORDER BY id)
FROM dbo.tbl
)
SELECT col1, col2, col3 FROM x WHERE rn = 1;
This shows the rows you want to delete:
;WITH x AS
(
SELECT col1, col2, col3, rn = ROW_NUMBER() OVER
(PARTITION BY col1, col2, col3 ORDER BY id)
FROM dbo.tbl
)
SELECT col1, col2, col3 FROM x WHERE rn > 1;
And once you're happy that the above two sets are correct, the following will actually delete them:
;WITH x AS
(
SELECT col1, col2, col3, rn = ROW_NUMBER() OVER
(PARTITION BY col1, col2, col3 ORDER BY id)
FROM dbo.tbl
)
DELETE x WHERE rn > 1;
Note that in all three queries, the first 6 lines are identical, and only the subsequent query after the CTE has changed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With