I have been trying to debug a 'Lock wait timeout exceeded' error in MySQL (AWS RDS) v5.6.19a, which is sporadically thrown when I attempt to select a row, using the primary id, for update, i.e:
SELECT primary_id FROM tbl_widgets WHERE primary_id = 5 FOR UPDATE
After many hours debugging I have ruled out another part of my application 'directly' locking the same row (which was the obvious culprit). As such I have started to dig deeper into the rabbit hole that is mysql locking and noticed the following correlation between the 'Lock wait timeout exceeded' error being thrown and information provided by:
SHOW ENGINE INNODB STATUS;
There appears to be a long running TRANSACTION in the cleaning up state that is locking a slowly increasing number of rows for upto ~10 minutes, here are the relevant lines for this transaction from 10 manual INNODB STATUS queries:
2015-08-19 13:29:04
---TRANSACTION 25861246681, ACTIVE 158 sec
10 lock struct(s), heap size 1184, 21 row lock(s), undo log entries 20
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7146839061 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682
2015-08-19 13:29:42
---TRANSACTION 25861246681, ACTIVE 196 sec
13 lock struct(s), heap size 2936, 28 row lock(s), undo log entries 27
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147149416 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682
2015-08-19 13:30:10
---TRANSACTION 25861246681, ACTIVE 224 sec
13 lock struct(s), heap size 2936, 31 row lock(s), undo log entries 30
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147321023 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682
2015-08-19 13:30:41
---TRANSACTION 25861246681, ACTIVE 255 sec
13 lock struct(s), heap size 2936, 35 row lock(s), undo log entries 34
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147511090 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682
2015-08-19 13:31:12
---TRANSACTION 25861246681, ACTIVE 286 sec
15 lock struct(s), heap size 2936, 38 row lock(s), undo log entries 37
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147604774 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682
2015-08-19 13:31:30
---TRANSACTION 25861246681, ACTIVE 304 sec
21 lock struct(s), heap size 2936, 42 row lock(s), undo log entries 39
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147789789 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682
2015-08-19 13:31:57
---TRANSACTION 25861246681, ACTIVE 331 sec
21 lock struct(s), heap size 2936, 46 row lock(s), undo log entries 43
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147837536 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682
2015-08-19 13:32:28
---TRANSACTION 25861246681, ACTIVE 362 sec
22 lock struct(s), heap size 2936, 51 row lock(s), undo log entries 48
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147905807 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682
2015-08-19 13:33:16
---TRANSACTION 25861246681, ACTIVE 410 sec
23 lock struct(s), heap size 2936, 58 row lock(s), undo log entries 55
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7148317478 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682
2015-08-19 13:33:49
---TRANSACTION 25861246681, ACTIVE 443 sec
24 lock struct(s), heap size 2936, 64 row lock(s), undo log entries 61
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7148471519 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682
I came across the following blog post (http://databaseblog.myname.nl/2014/10/when-your-query-is-blocked-but-there-is_26.html) that provides a potential solution which helps determine what is going on in this long running transaction, in particular to set:
set GLOBAL innodb_status_output_locks=ON;
Unfortunately it is not possible to perform this operation on RDS due to the restricted permissions.
I would kindly like to request some debugging help on how I might work out what is going on in this cleaning up transaction, and possibly how to avoid it all together.
Edit to add: Average CPU usage of the MySQL instance is 20%
To empty the content of a MySQL database use phpMyAdmin or SSH. It is highly recommended to create a backup of your database (phpMyAdmin or SSH) prior to making any changes to your databases. This is by far the easiest way to empty a MySQL database. Once in phpMyAdmin, select the database you wish to empty.
Run the 'show processlist;' query from within MySQL interactive mode prompt. (Adding the 'full' modifier to the command disables truncation of the Info column. This is necessary when viewing long queries.) Pro: Using the full modifier allows for seeing the full query on longer queries.
In Aurora the 'cleaned up' state is the final state of a connection whose work is complete but which has not been closed from the client side. In MySQL this field is left blank (no State) in the same circumstance.
This means that MySQL has some data stored on the disk (or in memory) which is yet to be read and sent over. It may be the table itself, an index, a temporary table, a sorted output etc.
In my case my "cleaning up" locks went away after I killed the JVM I was running my debugger in. Apparently they were a remnant of an earlier debug run that I interrupted before the transaction was cleaned up.
That probably doesn't help you, but here are a couple suggestions for debugging in this situation.
You do have one piece of information, which is the number of locks. Using breakpoints you can pause the application in various places to try and pinpoint exactly when the count goes up. (Or maybe it goes up only after certain errors are seen in the log; or only after certain user actions.)
If you can't use breakpoints, you do have one other tool, which is a select for update
statement that blocks after the lock has occurred. You might be able to sprinkle it around your code, possibly with additional logging, to pinpoint where the blocking starts.
Consider temporarily debugging the application against a locally installed MySQL database. This could either be installed on a local server, or on your development machine. This might be a hassle to set up, but can have many other benefits (e.g. test bed for db scripts; ability to work on laptop while offline.)
All this assumes that the lock is being caused by your own code, and not by some other job. (In your log the cleaning-up user is "mfuser".) That makes it possible for you to reproduce the problem on demand.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With