How do I delete duplicates rows in Postgres 9 table, the rows are completely duplicates on every field AND there is no individual field that could be used as a unique key so I cant just <code>GROUP BY</code> columns and use a <code>NOT IN</code> statement. I'm looking for a single SQL statement, not a solution that requires me to create temporary table and insert records into that. I know how to do that but requires more work to fit into my automated process. Table definition: <pre class="prettyprint"><code>jthinksearch=> \d releases_labels; Unlogged table "discogs.releases_labels" Column | Type | Modifiers ------------+---------+----------- label | text | release_id | integer | catno | text | Indexes: "releases_labels_catno_idx" btree (catno) "releases_labels_name_idx" btree (label) Foreign-key constraints: "foreign_did" FOREIGN KEY (release_id) REFERENCES release(id) </code></pre> Sample data: <pre class="prettyprint"><code>jthinksearch=> select * from releases_labels where release_id=6155; label | release_id | catno --------------+------------+------------ Warp Records | 6155 | WAP 39 CDR Warp Records | 6155 | WAP 39 CDR </code></pre>

If you can afford to rewrite the whole table, this is probably the simplest approach: <pre class="prettyprint"><code>WITH Deleted AS ( DELETE FROM discogs.releases_labels RETURNING * ) INSERT INTO discogs.releases_labels SELECT DISTINCT * FROM Deleted </code></pre> If you need to specifically target the duplicated records, you can make use of the internal <code>ctid</code> field, which uniquely identifies a row: <pre class="prettyprint"><code>DELETE FROM discogs.releases_labels WHERE ctid NOT IN ( SELECT MIN(ctid) FROM discogs.releases_labels GROUP BY label, release_id, catno ) </code></pre> Be very careful with <code>ctid</code>; it changes over time. But you can rely on it staying the same within the scope of a single statement.

Delete duplicate rows from table with no unique key

Tags:

sql

duplicates

postgresql

duplicate-removal

How do I delete duplicates rows in Postgres 9 table, the rows are completely duplicates on every field AND there is no individual field that could be used as a unique key so I cant just GROUP BY columns and use a NOT IN statement.

I'm looking for a single SQL statement, not a solution that requires me to create temporary table and insert records into that. I know how to do that but requires more work to fit into my automated process.

Table definition:

Click to copy

jthinksearch=> \d releases_labels;
Unlogged table "discogs.releases_labels"
   Column   |  Type   | Modifiers
------------+---------+-----------
 label      | text    |
 release_id | integer |
 catno      | text    |
Indexes:
    "releases_labels_catno_idx" btree (catno)
    "releases_labels_name_idx" btree (label)
Foreign-key constraints:
    "foreign_did" FOREIGN KEY (release_id) REFERENCES release(id)

Sample data:

Click to copy

jthinksearch=> select * from releases_labels  where release_id=6155;
    label     | release_id |   catno
--------------+------------+------------
 Warp Records |       6155 | WAP 39 CDR
 Warp Records |       6155 | WAP 39 CDR

594

asked Apr 02 '15 09:04

Paul Taylor

1 Answers

If you can afford to rewrite the whole table, this is probably the simplest approach:

Click to copy

WITH Deleted AS (
  DELETE FROM discogs.releases_labels
  RETURNING *
)
INSERT INTO discogs.releases_labels
SELECT DISTINCT * FROM Deleted

If you need to specifically target the duplicated records, you can make use of the internal ctid field, which uniquely identifies a row:

Click to copy

DELETE FROM discogs.releases_labels
WHERE ctid NOT IN (
  SELECT MIN(ctid)
  FROM discogs.releases_labels
  GROUP BY label, release_id, catno
)

Be very careful with ctid; it changes over time. But you can rely on it staying the same within the scope of a single statement.

answered Sep 27 '22 23:09

Nick Barnes

Related questions
                            
                                Remove blank-padding from to_char() output
                            
                                How to use bind_result() instead of get_result() in php
                            
                                Connect sqlsrv in Xampp
                            
                                MySQL Error Code: 1349. View's SELECT contains a subquery in the FROM clause
                            
                                Better to query once, then organize objects based on returned column value, or query twice with different conditions?
                            
                                How do I get SQL database into R from local host?
                            
                                Redshift DB backend for Django
                            
                                What is the equivalent of SQL's IN keyword in R?
                            
                                INSERT a row only with default or null values
                            
                                Dynamics CRM 2011 Security Role SQL query
                            
                                MySQL: Access denied for user to database
                            
                                REGEX to select nth value from a list, allowing for nulls
                            
                                Calculate the time difference between two timestamps in mysql
                            
                                SQL query fails when using pyodbc, but works in SQL
                            
                                How can I make null values be considered as MAX in SQL?
                            
                                MySQL error "empty string given as argument for ! character"
                            
                                Retry Entity Framwork DbContext.SaveChanges after double inserting a key
                            
                                SQL Server Stored Procedure IF Exist Update Else Insert
                            
                                MySql: Get count of incremented items by multiple conditions
                            
                                Oracle : 'Count over Partition by' output on first row of the keyword alone

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Delete duplicate rows from table with no unique key

Tags:

sql

duplicates

postgresql

duplicate-removal

Paul Taylor

People also ask

1 Answers

Nick Barnes

Recent Activity

Donate For Us