Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL delete duplicate records but keep latest

I have unique id and email fields. Emails get duplicated. I only want to keep one Email address of all the duplicates but with the latest id (the last inserted record).

How can I achieve this?

like image 727
Khuram Avatar asked May 24 '11 07:05

Khuram


People also ask

How do I remove duplicates but keep rows?

Remove duplicates but keep rest of row values with FilterWith a formula and the Filter function, you can quickly remove duplicates but keep rest. 5. Click Data > Filter to disable Filter, and remove the formulas as you need. You can see all duplicates have been removed and the rest of values are kept in the row.


2 Answers

Imagine your table test contains the following data:

  select id, email     from test;  ID                     EMAIL                 ---------------------- --------------------  1                      aaa                   2                      bbb                   3                      ccc                   4                      bbb                   5                      ddd                   6                      eee                   7                      aaa                   8                      aaa                   9                      eee  

So, we need to find all repeated emails and delete all of them, but the latest id.
In this case, aaa, bbb and eee are repeated, so we want to delete IDs 1, 7, 2 and 6.

To accomplish this, first we need to find all the repeated emails:

      select email          from test        group by email       having count(*) > 1;  EMAIL                 --------------------  aaa                   bbb                   eee   

Then, from this dataset, we need to find the latest id for each one of these repeated emails:

  select max(id) as lastId, email     from test    where email in (               select email                  from test                group by email               having count(*) > 1        )    group by email;  LASTID                 EMAIL                 ---------------------- --------------------  8                      aaa                   4                      bbb                   9                      eee                                  

Finally we can now delete all of these emails with an Id smaller than LASTID. So the solution is:

delete test   from test  inner join (   select max(id) as lastId, email     from test    where email in (               select email                  from test                group by email               having count(*) > 1        )    group by email ) duplic on duplic.email = test.email  where test.id < duplic.lastId; 

I don't have mySql installed on this machine right now, but should work

Update

The above delete works, but I found a more optimized version:

 delete test    from test   inner join (      select max(id) as lastId, email        from test       group by email      having count(*) > 1) duplic on duplic.email = test.email   where test.id < duplic.lastId; 

You can see that it deletes the oldest duplicates, i.e. 1, 7, 2, 6:

select * from test; +----+-------+ | id | email | +----+-------+ |  3 | ccc   | |  4 | bbb   | |  5 | ddd   | |  8 | aaa   | |  9 | eee   | +----+-------+ 

Another version, is the delete provived by Rene Limon

delete from test  where id not in (     select max(id)       from test      group by email) 
like image 120
Jose Rui Santos Avatar answered Sep 17 '22 17:09

Jose Rui Santos


Try this method

DELETE t1 FROM test t1, test t2  WHERE t1.id > t2.id AND t1.email = t2.email 
like image 20
Pulkit Malhotra Avatar answered Sep 17 '22 17:09

Pulkit Malhotra