Delete duplicate rows from small table

People also ask

How do I remove duplicate rows from student table?

The basic syntax to eliminate duplicate records from a table is: SELECT DISTINCT column1, column2,.... columnN. FROM table _name.

Can you delete duplicate rows in a table without using a temporary table?

You can't. Fully duplicate records (ones which have every field equal) can't be deleted by one, because you can't differentiate them in a Where clause in a delete query. The only way would be to do a select distinct query to select all rows without duplicates, then insert them in an empty table.

A faster solution is

DELETE FROM dups a USING (
      SELECT MIN(ctid) as ctid, key
        FROM dups 
        GROUP BY key HAVING COUNT(*) > 1
      ) b
      WHERE a.key = b.key 
      AND a.ctid <> b.ctid

DELETE FROM dupes a
WHERE a.ctid <> (SELECT min(b.ctid)
                 FROM   dupes b
                 WHERE  a.key = b.key);

This is fast and concise:

DELETE FROM dupes T1
    USING   dupes T2
WHERE   T1.ctid < T2.ctid  -- delete the older versions
    AND T1.key  = T2.key;  -- add more columns if needed

See also my answer at How to delete duplicate rows without unique identifier which includes more information.

I tried this:

DELETE FROM tablename
WHERE id IN (SELECT id
              FROM (SELECT id,
                             ROW_NUMBER() OVER (partition BY column1, column2, column3 ORDER BY id) AS rnum
                     FROM tablename) t
              WHERE t.rnum > 1);

provided by Postgres wiki:

https://wiki.postgresql.org/wiki/Deleting_duplicates

EXISTS is simple and among the fastest for most data distributions:

DELETE FROM dupes d
WHERE  EXISTS (
   SELECT FROM dupes
   WHERE  key = d.key
   AND    ctid < d.ctid
   );

From each set of duplicate rows (defined by identical key), this keeps the one row with the minimum ctid.

Result is identical to the currently accepted answer by a_horse. Just faster, because EXISTS can stop evaluating as soon as the first offending row is found, while the alternative with min() has to consider all rows per group to compute the minimum. Speed is of no concern to this question, but why not take it?

You may want to add a UNIQUE constraint after cleaning up, to prevent duplicates from creeping back in:

ALTER TABLE dupes ADD CONSTRAINT constraint_name_here UNIQUE (key);

About the system column ctid:

Is the system column “ctid” legitimate for identifying rows to delete?

If there is any other column defined UNIQUE NOT NULL column in the table (like a PRIMARY KEY) then, by all means, use it instead of ctid.

If key can be NULL and you only want one of those, too, use IS NOT DISTINCT FROM instead of =. See:

How do I (or can I) SELECT DISTINCT on multiple columns?

As that's slower, you might instead run the above query as is, and this in addition:

DELETE FROM dupes d
WHERE  key IS NULL
AND    EXISTS (
   SELECT FROM dupes
   WHERE  key IS NULL
   AND    ctid < d.ctid
   );

And consider:

Create unique constraint with null columns

For small tables, indexes generally do not help performance. And we need not look further.

For big tables and few duplicates, an existing index on (key) can help (a lot).

For mostly duplicates, an index may add more cost than benefit, as it has to be kept up to date concurrently. Finding duplicates without index becomes faster anyway because there are so many and EXISTS only needs to find one. But consider a completely different approach if you can afford it (i.e. concurrent access allows it): Write the few surviving rows to a new table. That also removes table (and index) bloat in the process. See:

How to delete duplicate entries?

Related questions
                            
                                SQL statement to select all rows from previous day
                            
                                SQLite Reset Primary Key Field
                            
                                How can I schedule a job to run a SQL query daily?
                            
                                How to extract year and month from date in PostgreSQL without using to_char() function?
                            
                                Is there any difference between "!=" and "<>" in Oracle Sql?
                            
                                Naming convention for unique constraint
                            
                                What's best SQL datatype for storing JSON string?
                            
                                What is the reason not to use select *?
                            
                                Insert all values of a table into another table in SQL
                            
                                How to detect if a stored procedure already exists
                            
                                Formatting Numbers by padding with leading zeros in SQL Server
                            
                                How to use Oracle ORDER BY and ROWNUM correctly?
                            
                                Good reasons NOT to use a relational database?
                            
                                Unknown Column In Where Clause
                            
                                How can I query a value in SQL Server XML column
                            
                                How do I drop table variables in SQL-Server? Should I even do this?
                            
                                How to list records with date from the last 10 days?
                            
                                Storing sex (gender) in database
                            
                                How do you copy a record in a SQL table but swap out the unique id of the new row?
                            
                                How to select only 1 row from oracle sql?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Delete duplicate rows from small table

Tags:

sql

duplicates

postgresql

People also ask

Recent Activity

Donate For Us