Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find duplicate rows with PostgreSQL

We have a table of photos with the following columns:

id, merchant_id, url  

this table contains duplicate values for the combination merchant_id, url. so it's possible that one row appears more several times.

234 some_merchant  http://www.some-image-url.com/abscde1213 235 some_merchant  http://www.some-image-url.com/abscde1213 236 some_merchant  http://www.some-image-url.com/abscde1213 

What is the best way to delete those duplications? (I use PostgreSQL 9.2 and Rails 3.)

like image 459
schlubbi Avatar asked Jan 23 '13 01:01

schlubbi


People also ask

Does Postgres allow duplicate rows?

PostgreSQL will use this mode to insert each row's index entry. The access method must allow duplicate entries into the index, and report any potential duplicates by returning false from aminsert . For each row for which false is returned, a deferred recheck will be scheduled.

How do I select duplicate rows in SQL?

To select duplicate values, you need to create groups of rows with the same values and then select the groups with counts greater than one. You can achieve that by using GROUP BY and a HAVING clause.

How do I get unique records in PostgreSQL?

Removing duplicate rows from a query result set in PostgreSQL can be done using the SELECT statement with the DISTINCT clause. It keeps one row for each group of duplicates. The DISTINCT clause can be used for a single column or for a list of columns.

How do I find duplicate records?

One way to find duplicate records from the table is the GROUP BY statement. The GROUP BY statement in SQL is used to arrange identical data into groups with the help of some functions. i.e if a particular column has the same values in different rows then it will arrange these rows in a group.


2 Answers

Here is my take on it.

select * from (   SELECT id,   ROW_NUMBER() OVER(PARTITION BY merchant_Id, url ORDER BY id asc) AS Row   FROM Photos ) dups where  dups.Row > 1 

Feel free to play with the order by to tailor the records you want to delete to your specification.

SQL Fiddle => http://sqlfiddle.com/#!15/d6941/1/0


SQL Fiddle for Postgres 9.2 is no longer supported; updating SQL Fiddle to postgres 9.3

like image 74
MatthewJ Avatar answered Sep 23 '22 06:09

MatthewJ


The second part of sgeddes's answer doesn't work on Postgres (the fiddle uses MySQL). Here is an updated version of his answer using Postgres: http://sqlfiddle.com/#!12/6b1a7/1

DELETE FROM Photos AS P1   USING Photos AS P2 WHERE P1.id > P2.id    AND P1.merchant_id = P2.merchant_id      AND P1.url = P2.url;   
like image 31
11101101b Avatar answered Sep 25 '22 06:09

11101101b