How would I delete all duplicate data from a MySQL Table?
For example, with the following data:
SELECT * FROM names; +----+--------+ | id | name | +----+--------+ | 1 | google | | 2 | yahoo | | 3 | msn | | 4 | google | | 5 | google | | 6 | yahoo | +----+--------+
I would use SELECT DISTINCT name FROM names;
if it were a SELECT
query.
How would I do this with DELETE
to only remove duplicates and keep just one record of each?
You can use DELETE command with some condition for this since we need to keep one record and delete rest of the duplicate records. The above query deleted 2 rows for “Carol” and left one of the “Carol” record.
Eliminating Duplicates from a Query Resultmysql> SELECT DISTINCT last_name, first_name -> FROM person_tbl -> ORDER BY last_name; An alternative to the DISTINCT command is to add a GROUP BY clause that names the columns you are selecting.
Editor warning: This solution is computationally inefficient and may bring down your connection for a large table.
NB - You need to do this first on a test copy of your table!
When I did it, I found that unless I also included AND n1.id <> n2.id
, it deleted every row in the table.
If you want to keep the row with the lowest id
value:
DELETE n1 FROM names n1, names n2 WHERE n1.id > n2.id AND n1.name = n2.name
If you want to keep the row with the highest id
value:
DELETE n1 FROM names n1, names n2 WHERE n1.id < n2.id AND n1.name = n2.name
I used this method in MySQL 5.1
Not sure about other versions.
Update: Since people Googling for removing duplicates end up here
Although the OP's question is about DELETE
, please be advised that using INSERT
and DISTINCT
is much faster. For a database with 8 million rows, the below query took 13 minutes, while using DELETE
, it took more than 2 hours and yet didn't complete.
INSERT INTO tempTableName(cellId,attributeId,entityRowId,value) SELECT DISTINCT cellId,attributeId,entityRowId,value FROM tableName;
If you want to keep the row with the lowest id
value:
DELETE FROM NAMES WHERE id NOT IN (SELECT * FROM (SELECT MIN(n.id) FROM NAMES n GROUP BY n.name) x)
If you want the id
value that is the highest:
DELETE FROM NAMES WHERE id NOT IN (SELECT * FROM (SELECT MAX(n.id) FROM NAMES n GROUP BY n.name) x)
The subquery in a subquery is necessary for MySQL, or you'll get a 1093 error.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With