Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count total number of duplicate rows in a table

Tags:

sql

postgresql

I have the following SQL for inspecting duplicate rows in a table that has the following columns: id, case_id, raw_name, initials, name, judge_id, magistrate_id and score.

SELECT MIN(id), case_id, initials, raw_name, count(*)
FROM my_table
GROUP BY case_id, raw_name, initials, name, judge_id, magistrate_id
HAVING count(*) > 1;

(A row is considered duplicate if it contains the same values in the case_id, raw_name, initials, name, judge_id and magistrate_id columns.)

How can get the total number of duplicate rows that need to be deleted (leaving 1 row remaining for each set of duplicates)?

like image 320
bard Avatar asked Nov 08 '22 12:11

bard


1 Answers

Duplicate problems can often be expressed in terms of EXISTS(the other)

SELECT COUNT(*)
FROM my_table mt
WHERE EXISTS ( SELECT *
        FROM my_table x
        WHERE x.case_id = mt.case_id -- exactly the same "keys"
        AND x.raw_name = mt.raw_name
        AND x.initials = mt.initials
        AND x.name = mt.name
        AND x.judge_id = mt.judge_id
        AND x.magistrate_id = mt.magistrate_id
        AND x.id < mt.id             -- but a smaller (surrogate) key
          -- If your table doesn't have a unique (surrogate) key,
          -- you can use the internal "ctid" which is guaranteed to be unique
          -- AND x.ctid < mt.ctid
        );

For your final delete query: just replace the SELECT COUNT(*) by DELETE.

like image 97
joop Avatar answered Nov 14 '22 21:11

joop