Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to delete duplicate entries?

I have to add a unique constraint to an existing table. This is fine except that the table has millions of rows already, and many of the rows violate the unique constraint I need to add.

What is the fastest approach to removing the offending rows? I have an SQL statement which finds the duplicates and deletes them, but it is taking forever to run. Is there another way to solve this problem? Maybe backing up the table, then restoring after the constraint is added?

like image 250
gjrwebber Avatar asked Nov 17 '09 02:11

gjrwebber


People also ask

What is the formula to remove duplicates in Excel?

To begin with, select the range in which you want to ddelete dupes. To select the entire table, press Ctrl + A. Go to the Data tab > Data Tools group, and click the Remove Duplicates button. The Remove Duplicates dialog box will open, you select the columns to check for duplicates, and click OK.


1 Answers

Some of these approaches seem a little complicated, and I generally do this as:

Given table table, want to unique it on (field1, field2) keeping the row with the max field3:

DELETE FROM table USING table alias    WHERE table.field1 = alias.field1 AND table.field2 = alias.field2 AND     table.max_field < alias.max_field 

For example, I have a table, user_accounts, and I want to add a unique constraint on email, but I have some duplicates. Say also that I want to keep the most recently created one (max id among duplicates).

DELETE FROM user_accounts USING user_accounts ua2   WHERE user_accounts.email = ua2.email AND user_account.id < ua2.id; 
  • Note - USING is not standard SQL, it is a PostgreSQL extension (but a very useful one), but the original question specifically mentions PostgreSQL.
like image 176
Tim Avatar answered Oct 03 '22 05:10

Tim