I have to clean a table with duplicate rows:
id: serial id
gid: group id
url: string <- this is the column that I have to cleanup
One gid
may have multiple url
values:
id gid url
---- ---- ------------
1 12 www.gmail.com
2 12 www.some.com
3 12 www.some.com <-- duplicate
4 13 www.other.com
5 13 www.milfsome.com <-- not a duplicate
I want to execute one query against the entire table and delete all rows where the gid
and url
are duplicate. In the above sample, after the delete, I want to have only 1, 2, 4 and 5 remaining.
This method will introduce the Remove Duplicates feature to remove entire rows based on duplicates in one column easily in Excel. 1. Select the range you will delete rows based on duplicates in one column, and then click Data > Remove Duplicates.
;WITH x AS
(
SELECT id, gid, url, rn = ROW_NUMBER() OVER
(PARTITION BY gid, url ORDER BY id)
FROM dbo.table
)
SELECT id,gid,url FROM x WHERE rn = 1 -- the rows you'll keep
-- SELECT id,gid,url FROM x WHERE rn > 1 -- the rows you'll delete
-- DELETE x WHERE rn > 1; -- do the delete
Once you're happy with the first select, which indicates the rows you'll keep, remove it and un-comment the second select. Once you're happy with that, which indicates the rows you'll delete, remove it and un-comment the delete.
And if you don't want to delete data, just ignore the commented lines under the SELECT
...
SELECT
MIN(id) AS id,
gid,
url
FROM yourTable
GROUP BY gid, url
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With