I have always deleted duplicates with this kind of query:
delete from test a
using test b
where a.ctid < b.ctid
and a.col1=b.col1
and a.col2=b.col2
and a.col3=b.col3
Also, I have seen this query being used:
DELETE FROM test WHERE test.ctid NOT IN
(SELECT ctid FROM (
SELECT DISTINCT ON (col1, col2) *
FROM test));
And even this one (repeated until you run out of duplicates):
delete from test ju where ju.ctid in
(select ctid from (
select distinct on (col1, col2) * from test ou
where (select count(*) from test inr
where inr.col1= ou.col1 and inr.col2=ou.col2) > 1
Now I have run into a table with 5 million rows, which have indexes in the columns that are going to match in the where clause. And now I wonder:
Which, of all those methods that apparently do the same, is the most efficient and why? I just run the second one and it is taking it over 45 minutes to remove duplicates. I'm just curious about which would be the most efficient one, in case I have to remove duplicates from another huge table. It wouldn't matter if it has a primary key in the first place, you can always create it or not.
demo:db<>fiddle
Finding duplicates can be easily achieved by using row_number()
window function:
SELECT ctid
FROM(
SELECT
*,
ctid,
row_number() OVER (PARTITION BY col1, col2, col3 ORDER BY ctid)
FROM test
)s
WHERE row_number >= 2
This orders groups tied rows and adds a row counter. So every row with row_number > 1
is a duplicate which can be deleted:
DELETE
FROM test
WHERE ctid IN
(
SELECT ctid
FROM(
SELECT
*,
ctid,
row_number() OVER (PARTITION BY col1, col2, col3 ORDER BY ctid)
FROM test
)s
WHERE row_number >= 2
)
I don't know if this solution is faster than your attempts but your could give it a try.
Furthermore - as @a_horse_with_no_name already stated - I would recommend to use an own identifier instead of ctid
for performance issues.
Edit:
For my test data your first version seems to be a little bit faster than my solution. Your second version seems to be slower and your third version does not work for me (after fixing the compiling errors it shows no result).
demo:db<>fiddle
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With