Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL is very slow with DELETE query, Apache is weird while query runs

To start, a few details to describe the situation as a whole:

  • MySQL (5.1.50) database on a very beefy (32 CPU cores, 64GB RAM) FreeBSD 8.1-RELEASE machine which also runs Apache 2.2.
  • Apache gets an average of about 50 hits per second. The vast majority of these hits are API calls for a sale platform.
  • The API calls usually take about a half of a second or less to generate a result, but could take up to 30 seconds depending on third parties.
  • Each of the API calls stores a row in a database. The information stored there is important, but only for about fifteen minutes, after which it must expire.
  • In the table which stores API call information (schema for this table is below), InnoDB row-level locking is used to synchronize between threads (Apache connections, really) requesting the same information at the same time, which happens often. This means that several threads may be waiting for a lock on a row for up to 30 seconds, as API calls can take that long (but usually don't).
  • Above all, the most important thing to note is that everything works perfectly under normal circumstances.

That said, this is the very highly used table (fifty or so INSERTs per second, many SELECTs, row-level locking is utilized) I'm running the DELETE query on:

CREATE TABLE `sales` (
  `sale_id` int(32) unsigned NOT NULL auto_increment,
  `start_time` int(20) unsigned NOT NULL,
  `end_time` int(20) unsigned default NULL,
  `identifier` char(9) NOT NULL,
  `zip_code` char(5) NOT NULL,
  `income` mediumint(6) unsigned NOT NULL,
  PRIMARY KEY  USING BTREE (`sale_id`),
  UNIQUE KEY `SALE_DATA` (`ssn`,`zip_code`,`income`),
  KEY `SALE_START` USING BTREE (`start_time`)
) ENGINE=InnoDB DEFAULT CHARSET=ascii ROW_FORMAT=FIXED

The DELETE query looks like this, and is run every five minutes on cron (I'd prefer to run it once per minute):

DELETE FROM `sales` WHERE 
    `start_time` < UNIX_TIMESTAMP(NOW() - INTERVAL 30 MINUTE);

I've used INT for the time field because it is apparent that MySQL has trouble using indexes with DATETIME fields.

So this is the problem: The DELETE query seems to run fine the majority of the time (maybe 7 out of 10 times). Other times, the query finishes quickly, but MySQL seems to get choked up for awhile afterwards. I can't exactly prove it's MySQL that is acting up, but the times the symptoms happen definitely coincides with the times that this query is run. Here are the symptoms while everything is choked up:

  • Logging into MySQL and using SHOW FULL PROCESSLIST;, there are just a few INSERT INTOsales... queries running, where normally there are more than a hundred. What's abnormal here is actually the lack of any tasks in the process list, rather than there being too many. It seems MySQL stops taking connections entirely.
  • Checking Apache server-status, Apache has reached MaxClients. All threads are in "Sending reply" status.
  • Apache begins using lots of system time CPU. Load averages shoot way up, I've seen 1-minute load averages as high as 100. Normal load average for this machine is around 15. I see that it's using system CPU (as opposed to user CPU) because I use GKrellM to monitor it.
  • In top, there are many Apache processes using lots of CPU.
  • The web site and API (served by Apache of course) are unreachable most of the time. Some requests go through, but take around three or four minutes. Other requests reply after a time with a "Can't connect to MySQL server through /tmp/mysql.sock" error - this is the same error as I get when MySQL is over capacity and has too many connections (only it doesn't actually say too many connections).
  • MySQL accepts a maximum of 1024 connections, mysqltuner.pl reports "[!!] Highest connection usage: 100% (1025/1024)", meaning it's taken on more than it could handle at one point. Generally under normal conditions, there are only a few hundred concurrent MySQL connections at most. mysqltuner.pl reports no other issues, I'd be happy to paste the output if anybody wants.

Eventually, after about a minute or two, things recover on their own without any intervention. CPU usage goes back to normal, Apache and MySQL resume normal operations.

So, what can I do? :) How can I even begin to investigate why this is happening? I need that DELETE query to run for various reasons, why do things go bonkers when it's run (but not all the time)?

like image 598
Brett Gmoser Avatar asked Jan 14 '11 21:01

Brett Gmoser


1 Answers

hard one. This is not a response but the start of a brainstorming.

I would say, maybe, a re-Index problem on delete, on the doc we can find "delete quick" followed by "optimize table" to try avoiding the multi index-merge.

One other possibility, maybe as well, is a chain of dead lock on delete with at least one other thread, row locks could pause the delete operation, and the delete operation could pause some next row lock. And then you've got either a detected deadlock , or an undetected one and so a timeout occuring. How do you detect such concurrency aborted exceptions? Do you re-run your transactions? If you threads are doing a lot of different row locks in the same transactions chances are that the first deadlock will impact more and more threads (traffic jam).

Did you tried to lock the table in the delete transaction? Check the manual, the way of locking tables in transaction in Innodb or to get a SHARE LOCK on all rows. Maybe it will take you some time to get the table only for you but if your delete is quite fast no one will notice you've taken the table for you only for 1s.

Now even if you do not tried it before, it's maybe what the delete is doing. Check as well this doc on implicit locks, your delete query should be using the start_time index, so I'm quite sure your current delete is not locking all rows (not completly sure, they lock al analysed rows not only the rows matching the where condition), but the delete is quite certainly blocking inserts. Some examples of deadlocks with transaction performing deletes are explained. Good luck! For me it's too late to understand all the lock isolation impacts.

edit you could try to change your DELETE by an UPDATE setting a deleted=1, and perform the real delete on low usage times (if you have some). And change the client queries to check this indexed deleted status.

like image 67
regilero Avatar answered Oct 14 '22 00:10

regilero