Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

duplicate multi column entries postgresql

I have a bunch of data in a postgresql database. I think that two keys should form a unique pair, so want to enforce that in the database. I try

create unique index key1_key2_idx on table(key1,key2)

but that fails, telling me that I have duplicate entries.

How do I find these duplicate entries so I can delete them?

like image 724
Greg Hennessy Avatar asked Jul 05 '13 15:07

Greg Hennessy


2 Answers

select key1,key2,count(*)
from table
group by key1,key2
having count(*) > 1
order by 3 desc;

The critical part of the query to determine the duplicates is having count(*) > 1.

There are a whole bunch of neat tricks at the following link, including some examples of removing duplicates: http://postgres.cz/wiki/PostgreSQL_SQL_Tricks

like image 147
bma Avatar answered Nov 10 '22 22:11

bma


Assuming you only want to delete the duplicates and keep the original, the accepted answer is inaccurate -- it'll delete your originals as well and only keep records that have one entry from the start. This works on 9.x:

SELECT * FROM tblname WHERE ctid IN 
(SELECT ctid FROM 
  (SELECT ctid, ROW_NUMBER() OVER 
    (partition BY col1, col2, col3 ORDER BY ctid) AS rnum 
  FROM tblname) t
WHERE t.rnum > 1);

https://wiki.postgresql.org/wiki/Deleting_duplicates

like image 26
rocksfrow Avatar answered Nov 10 '22 23:11

rocksfrow