I've created a service application that uses multi-threading for parallel processing of data located in an InnoDB table (about 2-3 millions of records, and no more InnoDB-related queries performed by the application). Each thread makes the following queries to the mentioned table:
The guys from forum.percona.com gave me a piece of advice - do not use SELECT FOR UPDATE and UPDATE because of longer time needed for transaction to execute (2 queries), and waiting lock timeouts that result. Their advice was (autocommit is on):
and it was supposed to improve performance. However, instead, I got even more deadlocks and wait lock timeouts than before...
I read a lot about optimizing InnoDB, and tuned the server correspondlingly, so my InnoDB settings are 99% ok. This fact is also proven by the first scenario working fine and better than second one. The my.cnf file:
innodb_buffer_pool_size = 512M
innodb_thread_concurrency = 16
innodb_thread_sleep_delay = 0
innodb_log_buffer_size = 4M
innodb_flush_log_at_trx_commit=2
Any ideas why the optimization had no success?
The SELECT FOR UPDATE statement is used to order transactions by controlling concurrent access to one or more rows of a table. It works by locking the rows returned by a selection query, such that other transactions trying to access those rows are forced to wait for the transaction that locked the rows to finish.
FOR UPDATE on a non-existent record does not block other transactions.
A SELECT ... FOR UPDATE reads the latest available data, setting exclusive locks on each row it reads. Thus, it sets the same locks a searched SQL UPDATE would set on the rows.
Think of it this way -- It locks every row it had to look at. No index on the column -- It had to check every row, so all rows are locked. That effectively locks the entire table. UNIQUE index on the column -- Only one row need be touched, hence, locked.
What I understand from the description of your process is:
If this is the case then you are doing the right thing as this will have less locks then the second approach you mentioned.
You can decrease the lock contention further by removing the delete statement as this will lock the whole table. Rather than doing that add a flag (new column named processed) and update that. And delete the rows at the end when all the threads are done processing.
You can also make the work distribution intelligent by batching the work load - in your case the row range (may be using PK) which each thread is going to process - in that case you can do a simple select and no need for the FOR UPDATE clause and it will work fast.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With