We have a recurring process in which we want to, and need to, clean up our databases. Every client or prospect gets its own database (with 300 tables, and increasing every month), which is spun up within seconds, and seeded with some basic data.
After several months, the databases need to be cleaned up. We simply call DROP DATABASE customer_1
for each database (giving the MySQL server 10 seconds between each statement to 'rest'), followed by DROP USER 'customer_1'@'127.0.0.1')
.
Every so often, the entire database just hangs. SHOW PROCESSLIST
gives
Id User Command Time State Info
[pid] adm-user Query 300 System lock DROP DATABASE `customer_1`
No new queries will complete. Killing the relevant query pid will result in Command=Killing, and that's it. Nothing happens. The MySQL daemon cannot be stopped either, because it's still waiting on completing the query.
We've resulted to powering off the entire server, restarting it, and having MySQL do its automated crash recovery, which works fine. After which, we can drop another 10-30 databases, and then this event repeats itself.
We've read plenty on the subject, including but not limited to:
Seems like the consensus is, yes, it's MySQL that uses a global mutex lock on the table(space), combined with a large buffer pool size.
Our my.cnf:
innodb_file_per_table = 1
innodb_buffer_pool_size = 9G
innodb_log_file_size = 256M
innodb_flush_method = O_DIRECT
table_open_cache = 200000
table_definition_cache = 110000
innodb_flush_log_at_trx_commit = 2
Is there any way in which we can drop databases responsibly -- ie., without having the server go down for other prospects?
I've read that simply removing all table files could work, dropping the database afterwards, in which MySQL should simply remove references to the database.
One important thing you should do is use XFS filesystem for your MySQL datadir.
Dropping large files on ext3 filesystem takes too much time, as you no doubt read in the Percona blog you linked to. Using XFS makes dropping a large file much quicker, so the global mutex is held for a shorter time.
I would also drop the tables one at a time, to further reduce the time the mutex is held. Then after you've dropped all the tables, drop the database.
A database in MySQL is hardly a physical object at all. It's a subdirectory of the MySQL datadir, and a tiny file called db.opt
that stores a few properties of the database like its default character set (this is no longer even a separate file in MySQL 8.0). After all the tables are dropped, dropping the database itself is trivial.
Another suggestion is to first drop the customer's MySQL user, then let MySQL run for a few hours, until data from that customer's tables are no longer cached in the buffer pool. When you drop a large table, MySQL has to scan through the buffer pool to free pages belonging to that table. The larger the buffer pool, the longer this takes. So you can minimize this impact if you let the pages for that customer's tables expire and leave the buffer pool. This can take some time, because it's driven more by demand for other tables. There's no good way to force a table's page to leave the buffer pool, short of dropping the table.
I've done that in some environments. Make the "DROP TABLE" request into a RENAME TABLE to move the table into another schema that no user has access to. Then periodically run a script to really drop the tables that have been in that holding pen for more than 7 days. This gives time for the pages to be evicted from the buffer pool gradually as data from other tables displace them. Besides, it also gives a grace period for users to change their mind if they decide they dropped a table that they need after all.
Bill Karwin's recommendations seem reasonable (although RENAME TABLE
has triggered some of the same problems as DROP TABLE
in the past), but most of that stuff was supposed to have been fixed: Bug 51325 was fixed in 2011-12-20 in 5.6.4 and Bug 64284 was fixed in 2012-08-09 in 5.6.6.
You may be experiencing something related to MySQL bug 91977, for which one suggested workaround is to disable the Adaptive Hash Index while dropping tables/databases.
SET GLOBAL innodb_adaptive_hash_index = OFF;
DROP TABLE ...
SET GLOBAL innodb_adaptive_hash_index = ON;
Or maybe just drop the adaptive hash indexes entirely. See the above linked documentation which states that whether they are a net benefit or not is workload-dependent and you should do performance testing to decide whether or not to use them.
You may want to upgrade to the current MySQL 5.7.x which is 5.7.25 and file a bug report if you can still reproduce the problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With