Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete duplicate rows from table with no unique key

How do I delete duplicates rows in Postgres 9 table, the rows are completely duplicates on every field AND there is no individual field that could be used as a unique key so I cant just GROUP BY columns and use a NOT IN statement.

I'm looking for a single SQL statement, not a solution that requires me to create temporary table and insert records into that. I know how to do that but requires more work to fit into my automated process.

Table definition:

jthinksearch=> \d releases_labels;
Unlogged table "discogs.releases_labels"
   Column   |  Type   | Modifiers
------------+---------+-----------
 label      | text    |
 release_id | integer |
 catno      | text    |
Indexes:
    "releases_labels_catno_idx" btree (catno)
    "releases_labels_name_idx" btree (label)
Foreign-key constraints:
    "foreign_did" FOREIGN KEY (release_id) REFERENCES release(id)

Sample data:

jthinksearch=> select * from releases_labels  where release_id=6155;
    label     | release_id |   catno
--------------+------------+------------
 Warp Records |       6155 | WAP 39 CDR
 Warp Records |       6155 | WAP 39 CDR
like image 594
Paul Taylor Avatar asked Apr 02 '15 09:04

Paul Taylor


People also ask

How can I delete duplicate rows where no unique row ID exists?

RANK function to SQL delete duplicate rows We can use the SQL RANK function to remove the duplicate rows as well. SQL RANK function gives unique row ID for each row irrespective of the duplicate row. In the following query, we use a RANK function with the PARTITION BY clause.

How can I delete duplicate records in a table without primary key?

So to delete the duplicate record with SQL Server we can use the SET ROWCOUNT command to limit the number of rows affected by a query. By setting it to 1 we can just delete one of these rows in the table. Note: the select commands are just used to show the data prior and after the delete occurs.

How can I delete duplicate rows in a table?

If a table has duplicate rows, we can delete it by using the DELETE statement. In the case, we have a column, which is not the part of group used to evaluate the duplicate records in the table.

Can you delete duplicate rows in a table without using a temporary table?

You could do this without a problem by using a common table expression (CTE), you don't need to use any temporary tables at all. Just be careful if the delete is going against a high traffic table. Deleting large amounts of data can cause locking and blocking, also the tran log will be hit.


1 Answers

If you can afford to rewrite the whole table, this is probably the simplest approach:

WITH Deleted AS (
  DELETE FROM discogs.releases_labels
  RETURNING *
)
INSERT INTO discogs.releases_labels
SELECT DISTINCT * FROM Deleted

If you need to specifically target the duplicated records, you can make use of the internal ctid field, which uniquely identifies a row:

DELETE FROM discogs.releases_labels
WHERE ctid NOT IN (
  SELECT MIN(ctid)
  FROM discogs.releases_labels
  GROUP BY label, release_id, catno
)

Be very careful with ctid; it changes over time. But you can rely on it staying the same within the scope of a single statement.

like image 85
Nick Barnes Avatar answered Sep 27 '22 23:09

Nick Barnes