I inserted between two tables fields A,B,C,D, believing I had created a Unique Index on A,B,C,D to prevent duplicates. However I somehow simply made a normal index on those. So duplicates got inserted. It is 20 million record table. If I change my existing index from normal to unique or simply a add a new unique index for A,B,C,D will the duplicates be removed or will adding fail since unique records exist? I'd test it yet it is 30 mil records and I neither wish to mess the table up or duplicate it.

If you have duplicates in your table and you use <pre class="prettyprint lang-sql prettyprint-override"><code>ALTER TABLE mytable ADD UNIQUE INDEX myindex (A, B, C, D); </code></pre> the query will fail with Error 1062 (duplicate key). But if you use <code>IGNORE</code> <pre class="prettyprint lang-sql prettyprint-override"><code>-- (only works before MySQL 5.7.4) ALTER IGNORE TABLE mytable ADD UNIQUE INDEX myindex (A, B, C, D); </code></pre> the duplicates will be removed. But the documentation doesn't specify which row will be kept: <blockquote> <ul> <li> <code>IGNORE</code> is a MySQL extension to standard SQL. It controls how <code>ALTER TABLE</code> works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If <code>IGNORE</code> is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If <code>IGNORE</code> is specified, only one row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value. As of MySQL 5.7.4, the IGNORE clause for ALTER TABLE is removed and its use produces an error. </li> </ul> </blockquote> (ALTER TABLE Syntax) If your version is 5.7.4 or greater - you can: <ul> <li>Copy the data into a temporary table (it doesn't technically need to be temporary).</li> <li>Truncate the original table.</li> <li>Create the UNIQUE INDEX.</li> <li>And copy the data back with <code>INSERT IGNORE</code> (which is still available).</li> </ul> <pre class="prettyprint lang-sql prettyprint-override"><code>CREATE TABLE tmp_data SELECT * FROM mytable; TRUNCATE TABLE mytable; ALTER TABLE mytable ADD UNIQUE INDEX myindex (A, B, C, D); INSERT IGNORE INTO mytable SELECT * from tmp_data; DROP TABLE tmp_data; </code></pre> <blockquote> If you use the <code>IGNORE</code> modifier, errors that occur while executing the <code>INSERT</code> statement are ignored. For example, without <code>IGNORE</code>, a row that duplicates an existing <code>UNIQUE</code> index or <code>PRIMARY KEY</code> value in the table causes a duplicate-key error and the statement is aborted. With <code>IGNORE</code>, the row is discarded and no error occurs. Ignored errors generate warnings instead. </blockquote> (INSERT Syntax) Also see: INSERT ... SELECT Syntax and Comparison of the IGNORE Keyword and Strict SQL Mode

Removing duplicates with unique index

Tags:

mysql

duplicates

unique-index

I inserted between two tables fields A,B,C,D, believing I had created a Unique Index on A,B,C,D to prevent duplicates. However I somehow simply made a normal index on those. So duplicates got inserted. It is 20 million record table.

If I change my existing index from normal to unique or simply a add a new unique index for A,B,C,D will the duplicates be removed or will adding fail since unique records exist? I'd test it yet it is 30 mil records and I neither wish to mess the table up or duplicate it.

909

asked Apr 15 '16 12:04

user3649739

1 Answers

If you have duplicates in your table and you use

ALTER TABLE mytable ADD UNIQUE INDEX myindex (A, B, C, D);

the query will fail with Error 1062 (duplicate key).

But if you use IGNORE

-- (only works before MySQL 5.7.4) ALTER IGNORE TABLE mytable ADD UNIQUE INDEX myindex (A, B, C, D);

the duplicates will be removed. But the documentation doesn't specify which row will be kept:

IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only one row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.

As of MySQL 5.7.4, the IGNORE clause for ALTER TABLE is removed and its use produces an error.

(ALTER TABLE Syntax)

If your version is 5.7.4 or greater - you can:

Copy the data into a temporary table (it doesn't technically need to be temporary).
Truncate the original table.
Create the UNIQUE INDEX.
And copy the data back with INSERT IGNORE (which is still available).

CREATE TABLE tmp_data SELECT * FROM mytable; TRUNCATE TABLE mytable; ALTER TABLE mytable ADD UNIQUE INDEX myindex (A, B, C, D); INSERT IGNORE INTO mytable SELECT * from tmp_data; DROP TABLE tmp_data;

If you use the IGNORE modifier, errors that occur while executing the INSERT statement are ignored. For example, without IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted. With IGNORE, the row is discarded and no error occurs. Ignored errors generate warnings instead.

(INSERT Syntax)

Also see: INSERT ... SELECT Syntax and Comparison of the IGNORE Keyword and Strict SQL Mode

121

answered Oct 03 '22 00:10

Paul Spiegel

Related questions
                            
                                Location of MySQL configuration file (ie: my.cnf) not specified
                            
                                MySQL Calculate Percentage
                            
                                How to build a JSON array from mysql database
                            
                                On Duplicate Key Update - Multiple Columns
                            
                                DELETE FROM `table` AS `alias` ... WHERE `alias`.`column` ... why syntax error?
                            
                                Getting all parent rows in one SQL query
                            
                                Is there a command to test an SQL query without executing it? ( MySQL or ANSI SQL )
                            
                                MySQL Error: Incorrect usage of UPDATE and LIMIT
                            
                                Levenshtein: MySQL + PHP
                            
                                MySQL Workbench > Plugins > Utilities > Reformat SQL Query
                            
                                Upgraded to Ubuntu 16.04 now MySQL-python dependencies are broken
                            
                                Rails 3 query on condition of an association's count
                            
                                How to abort INSERT operation in MySql trigger?
                            
                                MySQL Integer 0 vs NULL
                            
                                Laravel join queries AS
                            
                                Find Point in polygon PHP
                            
                                Why isn't MySQL using any of these possible keys?
                            
                                MySQL ERROR 2026 - SSL connection error - Ubuntu 20.04
                            
                                Get the first and last date of next month in MySQL
                            
                                Set the result of a query to a variable in MySQL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With