I have the following SQL for inspecting duplicate rows in a table that has the following columns: id
, case_id
, raw_name
, initials
, name
, judge_id
, magistrate_id
and score
.
SELECT MIN(id), case_id, initials, raw_name, count(*)
FROM my_table
GROUP BY case_id, raw_name, initials, name, judge_id, magistrate_id
HAVING count(*) > 1;
(A row is considered duplicate if it contains the same values in the case_id
, raw_name
, initials
, name
, judge_id
and magistrate_id
columns.)
How can get the total number of duplicate rows that need to be deleted (leaving 1 row remaining for each set of duplicates)?
Duplicate problems can often be expressed in terms of EXISTS(the other)
SELECT COUNT(*)
FROM my_table mt
WHERE EXISTS ( SELECT *
FROM my_table x
WHERE x.case_id = mt.case_id -- exactly the same "keys"
AND x.raw_name = mt.raw_name
AND x.initials = mt.initials
AND x.name = mt.name
AND x.judge_id = mt.judge_id
AND x.magistrate_id = mt.magistrate_id
AND x.id < mt.id -- but a smaller (surrogate) key
-- If your table doesn't have a unique (surrogate) key,
-- you can use the internal "ctid" which is guaranteed to be unique
-- AND x.ctid < mt.ctid
);
For your final delete query: just replace the SELECT COUNT(*)
by DELETE
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With