I have unique id
and email
fields. Emails get duplicated. I only want to keep one Email address of all the duplicates but with the latest id
(the last inserted record).
How can I achieve this?
Remove duplicates but keep rest of row values with FilterWith a formula and the Filter function, you can quickly remove duplicates but keep rest. 5. Click Data > Filter to disable Filter, and remove the formulas as you need. You can see all duplicates have been removed and the rest of values are kept in the row.
Imagine your table test
contains the following data:
select id, email from test; ID EMAIL ---------------------- -------------------- 1 aaa 2 bbb 3 ccc 4 bbb 5 ddd 6 eee 7 aaa 8 aaa 9 eee
So, we need to find all repeated emails and delete all of them, but the latest id.
In this case, aaa
, bbb
and eee
are repeated, so we want to delete IDs 1, 7, 2 and 6.
To accomplish this, first we need to find all the repeated emails:
select email from test group by email having count(*) > 1; EMAIL -------------------- aaa bbb eee
Then, from this dataset, we need to find the latest id for each one of these repeated emails:
select max(id) as lastId, email from test where email in ( select email from test group by email having count(*) > 1 ) group by email; LASTID EMAIL ---------------------- -------------------- 8 aaa 4 bbb 9 eee
Finally we can now delete all of these emails with an Id smaller than LASTID. So the solution is:
delete test from test inner join ( select max(id) as lastId, email from test where email in ( select email from test group by email having count(*) > 1 ) group by email ) duplic on duplic.email = test.email where test.id < duplic.lastId;
I don't have mySql installed on this machine right now, but should work
The above delete works, but I found a more optimized version:
delete test from test inner join ( select max(id) as lastId, email from test group by email having count(*) > 1) duplic on duplic.email = test.email where test.id < duplic.lastId;
You can see that it deletes the oldest duplicates, i.e. 1, 7, 2, 6:
select * from test; +----+-------+ | id | email | +----+-------+ | 3 | ccc | | 4 | bbb | | 5 | ddd | | 8 | aaa | | 9 | eee | +----+-------+
Another version, is the delete provived by Rene Limon
delete from test where id not in ( select max(id) from test group by email)
Try this method
DELETE t1 FROM test t1, test t2 WHERE t1.id > t2.id AND t1.email = t2.email
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With