I recently found and fixed a bug in a site I was working on that resulted in millions of duplicate rows of data in a table that will be quite large even without them (still in the millions). I can easily find these duplicate rows and can run a single delete query to kill them all. The problem is that trying to delete this many rows in one shot locks up the table for a long time, which I would like to avoid if possible. The only ways I can see to get rid of these rows, without taking down the site (by locking up the table) are: <ol> <li>Write a script that will execute thousands of smaller delete queries in a loop. This will theoretically get around the locked table issue because other queries will be able to make it into the queue and run in between the deletes. But it will still spike the load on the database quite a bit and will take a long time to run.</li> <li>Rename the table and recreate the existing table (it'll now be empty). Then do my cleanup on the renamed table. Rename the new table, name the old one back and merge the new rows into the renamed table. This is way takes considerably more steps, but should get the job done with minimal interruption. The only tricky part here is that the table in question is a reporting table, so once it's renamed out of the way and the empty one put in its place all historic reports go away until I put it back in place. Plus the merging process could be a bit of a pain because of the type of data being stored. Overall this is my likely choice right now.</li> </ol> I was just wondering if anyone else has had this problem before and, if so, how you dealt with it without taking down the site and, hopefully, with minimal if any interruption to the users? If I go with number 2, or a different, similar, approach, I can schedule the stuff to run late at night and do the merge early the next morning and just let the users know ahead of time, so that's not a huge deal. I'm just looking to see if anyone has any ideas for a better, or easier, way to do the cleanup.

<pre class="prettyprint"><code>DELETE FROM `table` WHERE (whatever criteria) ORDER BY `id` LIMIT 1000 </code></pre> Wash, rinse, repeat until zero rows affected. Maybe in a script that sleeps for a second or three between iterations.

Deleting millions of rows in MySQL

Tags:

mysql

sql-delete

maintenance

query-performance

I recently found and fixed a bug in a site I was working on that resulted in millions of duplicate rows of data in a table that will be quite large even without them (still in the millions). I can easily find these duplicate rows and can run a single delete query to kill them all. The problem is that trying to delete this many rows in one shot locks up the table for a long time, which I would like to avoid if possible. The only ways I can see to get rid of these rows, without taking down the site (by locking up the table) are:

Write a script that will execute thousands of smaller delete queries in a loop. This will theoretically get around the locked table issue because other queries will be able to make it into the queue and run in between the deletes. But it will still spike the load on the database quite a bit and will take a long time to run.
Rename the table and recreate the existing table (it'll now be empty). Then do my cleanup on the renamed table. Rename the new table, name the old one back and merge the new rows into the renamed table. This is way takes considerably more steps, but should get the job done with minimal interruption. The only tricky part here is that the table in question is a reporting table, so once it's renamed out of the way and the empty one put in its place all historic reports go away until I put it back in place. Plus the merging process could be a bit of a pain because of the type of data being stored. Overall this is my likely choice right now.

I was just wondering if anyone else has had this problem before and, if so, how you dealt with it without taking down the site and, hopefully, with minimal if any interruption to the users? If I go with number 2, or a different, similar, approach, I can schedule the stuff to run late at night and do the merge early the next morning and just let the users know ahead of time, so that's not a huge deal. I'm just looking to see if anyone has any ideas for a better, or easier, way to do the cleanup.

988

asked Aug 23 '09 16:08

Steven Surowiec

1 Answers

DELETE FROM `table` WHERE (whatever criteria) ORDER BY `id` LIMIT 1000

Wash, rinse, repeat until zero rows affected. Maybe in a script that sleeps for a second or three between iterations.

102

answered Sep 23 '22 14:09

chaos

Related questions
                            
                                phpmyadmin automatic logout time
                            
                                After MySQL install via Brew, I get the error - The server quit without updating PID file
                            
                                Mysql password expired. Can't connect
                            
                                Best way to store chat messages in a database? [closed]
                            
                                Import MySQL database into a MS SQL Server
                            
                                Fulltext Search with InnoDB
                            
                                When to close cursors using MySQLdb
                            
                                How to properly set up a PDO connection
                            
                                how to concat two columns into one with the existing column name in mysql?
                            
                                Get a list of dates between two dates
                            
                                Save PHP array to MySQL?
                            
                                UUID performance in MySQL?
                            
                                Using group by on two fields and count in SQL
                            
                                Basics of Foreign Keys in MySQL?
                            
                                What does it mean to escape a string?
                            
                                mySQL select IN range
                            
                                How to round down to nearest integer in MySQL?
                            
                                MySQL update CASE WHEN/THEN/ELSE [duplicate]
                            
                                How to execute raw queries with Laravel 5.1?
                            
                                Get most recent row for given ID

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With