How to remove duplicate entries from a mysql db?

People also ask

How do I find duplicate values in MySQL?

Find duplicate values in one column First, use the GROUP BY clause to group all rows by the target column, which is the column that you want to check duplicate. Then, use the COUNT() function in the HAVING clause to check if any group have more than 1 element. These groups are duplicate.

This command adds a unique key, and drops all rows that generate errors (due to the unique key). This removes duplicates.

ALTER IGNORE TABLE table ADD UNIQUE KEY idx1(title);

Edit: Note that this command may not work for InnoDB tables for some versions of MySQL. See this post for a workaround. (Thanks to "an anonymous user" for this information.)

Create a new table with just the distinct rows of the original table. There may be other ways but I find this the cleanest.

CREATE TABLE tmp_table AS SELECT DISTINCT [....] FROM main_table

More specifically:
The faster way is to insert distinct rows into a temporary table. Using delete, it took me a few hours to remove duplicates from a table of 8 million rows. Using insert and distinct, it took just 13 minutes.

CREATE TABLE tempTableName LIKE tableName;  
CREATE INDEX ix_all_id ON tableName(cellId,attributeId,entityRowId,value);  
INSERT INTO tempTableName(cellId,attributeId,entityRowId,value) SELECT DISTINCT cellId,attributeId,entityRowId,value FROM tableName;  
DROP TABLE tableName;  
INSERT tableName SELECT * FROM tempTableName;  
DROP TABLE tempTableName;

Since the MySql ALTER IGNORE TABLE has been deprecated, you need to actually delete the duplicate date before adding an index.

First write a query that finds all the duplicates. Here I'm assuming that email is the field that contains duplicates.

SELECT
    s1.email
    s1.id, 
    s1.created
    s2.id,
    s2.created 
FROM 
    student AS s1 
INNER JOIN 
    student AS s2 
WHERE 
    /* Emails are the same */
    s1.email = s2.email AND
    /* DON'T select both accounts,
       only select the one created later.
       The serial id could also be used here */
    s2.created > s1.created 
;

Next select only the unique duplicate ids:

SELECT 
    DISTINCT s2.id
FROM 
    student AS s1 
INNER JOIN 
    student AS s2 
WHERE 
    s1.email = s2.email AND
    s2.created > s1.created 
;

Once you are sure that only contains the duplicate ids you want to delete, run the delete. You have to add (SELECT * FROM tblname) so that MySql doesn't complain.

DELETE FROM
    student 
WHERE
    id
IN (
    SELECT 
        DISTINCT s2.id
    FROM 
        (SELECT * FROM student) AS s1 
    INNER JOIN 
        (SELECT * FROM student) AS s2 
    WHERE 
        s1.email = s2.email AND
        s2.created > s1.created 
);

Then create the unique index:

ALTER TABLE
    student
ADD UNIQUE INDEX
    idx_student_unique_email(email)
;

Below query can be used to delete all the duplicate except the one row with lowest "id" field value

DELETE t1 FROM table_name t1, table_name t2 WHERE t1.id > t2.id AND t1.name = t2.name

In the similar way, we can keep the row with the highest value in 'id' as follows

 DELETE t1 FROM table_name t1, table_name t2 WHERE t1.id < t2.id AND t1.name = t2.name

Related questions
                            
                                SQL search multiple values in same field
                            
                                Using Jquery Ajax to retrieve data from Mysql
                            
                                How to get size of column in mysql table
                            
                                How can I use SQL's YEAR(), MONTH() and DAY() in Doctrine2?
                            
                                Is MySQL Temporary table a shared resource?
                            
                                MySQL query - using SUM of COUNT
                            
                                MySQL varchar index length
                            
                                MySQL: SELECT UNIQUE VALUE
                            
                                check for duplicate entry vs use PDO errorInfo result
                            
                                Enable Python to Connect to MySQL via SSH Tunnelling
                            
                                What is the difference between int and integer in MySQL 5.0?
                            
                                Run sql file in database from terminal
                            
                                #1292 - Incorrect date value: '0000-00-00' [duplicate]
                            
                                Show all tables inside a MySQL database using PHP?
                            
                                How do I add a check constraint in a Rails migration?
                            
                                MySQL equivalent of DECODE function in Oracle
                            
                                MyBatis, how to get the auto generated key of an insert? [MySql]
                            
                                WHERE datetime older than some time (eg. 15 minutes)
                            
                                Connect to MySQL on AWS from local machine
                            
                                pandas - Merging on string columns not working (bug?)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to remove duplicate entries from a mysql db?

Tags:

mysql

duplicate-removal

People also ask

Recent Activity

Donate For Us