Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deleting duplicates from a large table

I have quite a large table with 19 000 000 records, and I have problem with duplicate rows. There's a lot of similar questions even here in SO, but none of them seems to give me a satisfactory answer. Some points to consider:

  • Row uniqueness is determined by two columns, location_id and datetime.
  • I'd like to keep the execution time as fast as possible (< 1 hour).
  • Copying tables is not very feasible as the table is several gigabytes in size.
  • No need to worry about relations.

As said, every location_id can have only one distinct datetime, and I would like to remove all the duplicate instances. It does not matter which one of them survives, as the data is identical.

Any ideas?

like image 455
Tatu Ulmanen Avatar asked Mar 05 '10 10:03

Tatu Ulmanen


2 Answers

I think you can use this query to delete the duplicate records from the table

ALTER IGNORE TABLE table_name ADD UNIQUE (location_id, datetime)

Before doing this, just test with some sample data first..and then Try this....

Note: On version 5.5, it works on MyISAM but not InnoDB.

like image 52
Vinodkumar SC Avatar answered Oct 29 '22 06:10

Vinodkumar SC


SELECT *, COUNT(*) AS Count
FROM table
GROUP BY location_id, datetime
HAVING Count > 2
like image 23
Sjoerd Avatar answered Oct 29 '22 07:10

Sjoerd