Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete all Duplicate Rows except for One in MySQL? [duplicate]

How would I delete all duplicate data from a MySQL Table?

For example, with the following data:

SELECT * FROM names;  +----+--------+ | id | name   | +----+--------+ | 1  | google | | 2  | yahoo  | | 3  | msn    | | 4  | google | | 5  | google | | 6  | yahoo  | +----+--------+ 

I would use SELECT DISTINCT name FROM names; if it were a SELECT query.

How would I do this with DELETE to only remove duplicates and keep just one record of each?

like image 707
Highway of Life Avatar asked Jan 13 '11 20:01

Highway of Life


People also ask

How do I delete duplicate records except one in MySQL?

You can use DELETE command with some condition for this since we need to keep one record and delete rest of the duplicate records. The above query deleted 2 rows for “Carol” and left one of the “Carol” record.

How do I exclude duplicates in MySQL?

Eliminating Duplicates from a Query Resultmysql> SELECT DISTINCT last_name, first_name -> FROM person_tbl -> ORDER BY last_name; An alternative to the DISTINCT command is to add a GROUP BY clause that names the columns you are selecting.


2 Answers

Editor warning: This solution is computationally inefficient and may bring down your connection for a large table.

NB - You need to do this first on a test copy of your table!

When I did it, I found that unless I also included AND n1.id <> n2.id, it deleted every row in the table.

  1. If you want to keep the row with the lowest id value:

    DELETE n1 FROM names n1, names n2 WHERE n1.id > n2.id AND n1.name = n2.name 
  2. If you want to keep the row with the highest id value:

    DELETE n1 FROM names n1, names n2 WHERE n1.id < n2.id AND n1.name = n2.name 

I used this method in MySQL 5.1

Not sure about other versions.


Update: Since people Googling for removing duplicates end up here
Although the OP's question is about DELETE, please be advised that using INSERT and DISTINCT is much faster. For a database with 8 million rows, the below query took 13 minutes, while using DELETE, it took more than 2 hours and yet didn't complete.

INSERT INTO tempTableName(cellId,attributeId,entityRowId,value)     SELECT DISTINCT cellId,attributeId,entityRowId,value     FROM tableName; 
like image 169
martin.masa Avatar answered Sep 23 '22 12:09

martin.masa


If you want to keep the row with the lowest id value:

DELETE FROM NAMES  WHERE id NOT IN (SELECT *                      FROM (SELECT MIN(n.id)                             FROM NAMES n                         GROUP BY n.name) x) 

If you want the id value that is the highest:

DELETE FROM NAMES  WHERE id NOT IN (SELECT *                      FROM (SELECT MAX(n.id)                             FROM NAMES n                         GROUP BY n.name) x) 

The subquery in a subquery is necessary for MySQL, or you'll get a 1093 error.

like image 36
OMG Ponies Avatar answered Sep 22 '22 12:09

OMG Ponies