This morning I noticed that our MySQL server load was going sky high. Max should be 8 but it hit over 100 at one point. When I checked the process list I found loads of update queries (simple ones, incrementing a "hitcounter") that were in query end
state. We couldn't kill them (well, we could, but they remained in the killed
state indefinitely) and our site ground to a halt.
We had loads of problems restarting the service and had to forcibly kill some processes. When we did we were able to get MySQLd to come back up but the processes started to build up again immediately. As far as we're aware, no configuration had been changed at this point.
So, we changed innodb_flush_log_at_trx_commit
from 2 to 1 (note that we need ACID compliance) in the hope that this would resolve the problem, and set the connections in PHP/PDO to be persistent. This seemed to work for an hour or so, and then the connections started to run out again.
Fortunately, I set a slave server up a couple of months ago and was able to promote it and it's taking up the slack for now, but I need to understand why this has happened and how to stop it, since the slave server is significantly underpowered compared to the master, so I need to switch back soon.
Has anyone any ideas? Could it be that something needs clearing out? I don't know what, maybe the binary logs or something? Any ideas at all? It's extremely important that we can get this server back as the master ASAP but frankly I have no idea where to look and everything I have tried so far has only resulted in a temporary fix.
Help! :)
By default, the server closes the connection after 8 hours or 28800 seconds if nothing has happened. You can change the time limit by setting the wait_timeout variable when you start mysqld via your server's /etc/my. cnf [...] If you don't see wait_timeout, just add it.
MySQL can run more than 50,000 simple queries per second on commodity server hardware and over 2,000 queries per second from a single correspondent on a Gigabit network, so running multiple queries isn't necessarily such a bad thing.
I'll answer my own question here. I checked the partition sizes with a simple df
command and there I could see that /var was 100% full. I found an archive that someone had left that was 10GB in size. Deleted that, started MySQL, ran a PURGE LOGS BEFORE '2012-10-01 00:00:00'
query to clear out a load of space and reduced the /var/lib/mysql directory size from 346GB to 169GB. Changed back to master and everything is running great again.
From this I've learnt that our log files get VERY large, VERY quickly. So I'll be establishing a maintenance routine to not only keep the log files down, but also to alert me when we're nearing a full partition.
I hope that's some use to someone in the future who stumbles across this with the same problem. Check your drive space! :)
We've been having a very similar problem, where the mysql processlist showed that almost all of our connections were stuck in the "query end" state. Our problem was also related to replication and writing the binlog.
We changed the sync_binlog variable from 1 to 0, which means that instead of flushing binlog changes to disk on each commit, it allows the operating system to decide when to fsync() to the binlog. That entirely resolved the "query end" problem for us.
According to this post from Mats Kindahl, writing to the binlog won't be as much of a problem in the 5.6 release of MySQL.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With