Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete all the duplicates except one

Tags:

sql

mysql

We have a table business_users with a user_id and business_id and we have duplicates. How can I write a query that will delete all duplicates except for one?

like image 415
Matt Elhotiby Avatar asked May 18 '26 20:05

Matt Elhotiby


2 Answers

Completely identical rows

If you want to avoid completely identical rows, as I understood your question at first, then you can select unique rows to a separate table and recreate the table data from that.

CREATE TEMPORARY TABLE tmp SELECT DISTINCT * FROM business_users;
DELETE FROM business_users;
INSERT INTO business_users SELECT * FROM tmp;
DROP TABLE tmp;

Be careful if there are any foreign key constraints referencing this table, though, as the temporary deletion of rows might lead to cascaded deletions elsewhere.

Introducing a unique constraint

If you only care about pairs of user_id and business_id, you probably want to avoid introducing duplicates in the future. You can move the existing data to a temporary table, add a constraint, and then move the table data back, ignoring duplicates.

CREATE TEMPORARY TABLE tmp SELECT * FROM business_users;
DELETE FROM business_users;
ALTER TABLE business_users ADD UNIQUE (user_id, business_id);
INSERT IGNORE INTO business_users SELECT * FROM tmp;
DROP TABLE tmp;

The above answer is based on this answer. The warning about foreign keys applies just as it did in the section above.

One-shot removal

If you only want to execute a single query, without modifying the table structure in any way, and you have a primary key id identifying each row, then you can try the following:

DELETE FROM business_users WHERE id NOT IN
    (SELECT MIN(id) FROM business_users GROUP BY user_id, business_id);

A similar idea was previously suggested by this answer.

If the above request fails, because you are not allowed to read and delete from a table in the same step, you can again use a temporary table:

CREATE TEMPORARY TABLE tmp
SELECT MIN(id) id FROM business_users GROUP BY user_id, business_id;
DELETE FROM business_users WHERE id NOT IN (SELECT id FROM tmp);
DROP TABLE tmp;

If you want to, you can still introduce a uniqueness constraint after cleaning the data in this fashion. To do so, execute the ALTER TABLE line from the previous section.

like image 173
MvG Avatar answered May 21 '26 09:05

MvG


Since you have a primary key, you can use that to pick which rows to keep:

delete from business_users
where id not in (
    select id from (
        select min(id) as id -- Make a list of the primary keys to keep
        from business_users
        group by user_id, business_id -- Group by your duplicated row definition
    ) as a -- Derived table to force an implicit temp table
);

In this way, you won't need to create/drop temp tables and such (except the implicit one).

You might want to put a unique constraint on user_id, business_id so you don't have to worry about this again.

like image 35
Tim Lehner Avatar answered May 21 '26 10:05

Tim Lehner



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!