I have to add a unique constraint to an existing table. This is fine except that the table has millions of rows already, and many of the rows violate the unique constraint I need to add.
What is the fastest approach to removing the offending rows? I have an SQL statement which finds the duplicates and deletes them, but it is taking forever to run. Is there another way to solve this problem? Maybe backing up the table, then restoring after the constraint is added?
To begin with, select the range in which you want to ddelete dupes. To select the entire table, press Ctrl + A. Go to the Data tab > Data Tools group, and click the Remove Duplicates button. The Remove Duplicates dialog box will open, you select the columns to check for duplicates, and click OK.
Some of these approaches seem a little complicated, and I generally do this as:
Given table table
, want to unique it on (field1, field2) keeping the row with the max field3:
DELETE FROM table USING table alias WHERE table.field1 = alias.field1 AND table.field2 = alias.field2 AND table.max_field < alias.max_field
For example, I have a table, user_accounts
, and I want to add a unique constraint on email, but I have some duplicates. Say also that I want to keep the most recently created one (max id among duplicates).
DELETE FROM user_accounts USING user_accounts ua2 WHERE user_accounts.email = ua2.email AND user_account.id < ua2.id;
USING
is not standard SQL, it is a PostgreSQL extension (but a very useful one), but the original question specifically mentions PostgreSQL.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With