I've created a service application that uses multi-threading for parallel processing of data located in an InnoDB table (about 2-3 millions of records, and no more InnoDB-related queries performed by the application). Each thread makes the following queries to the mentioned table: <ol> <li>START TRANSACTION</li> <li>SELECT FOR UPDATE (SELECT pk FROM table WHERE status='new' LIMIT 100 FOR UPDATE)</li> <li>UPDATE (UPDATE table SET status='locked' WHERE pk BETWEEN X AND Y)</li> <li>COMMIT</li> <li>DELETE (DELETE FROM table WHERE pk BETWEEN X AND Y)</li> </ol> The guys from forum.percona.com gave me a piece of advice - do not use SELECT FOR UPDATE and UPDATE because of longer time needed for transaction to execute (2 queries), and waiting lock timeouts that result. Their advice was (autocommit is on): <ol> <li>UPDATE (UPDATE table SET status='locked', thread = Z LIMIT 100)</li> <li>SELECT (SELECT pk FROM table WHERE thread = Z)</li> <li>DELETE (DELETE FROM table WHERE pk BETWEEN X AND Y)</li> </ol> and it was supposed to improve performance. However, instead, I got even more deadlocks and wait lock timeouts than before... I read a lot about optimizing InnoDB, and tuned the server correspondlingly, so my InnoDB settings are 99% ok. This fact is also proven by the first scenario working fine and better than second one. The my.cnf file: <pre class="prettyprint"><code>innodb_buffer_pool_size = 512M innodb_thread_concurrency = 16 innodb_thread_sleep_delay = 0 innodb_log_buffer_size = 4M innodb_flush_log_at_trx_commit=2 </code></pre> Any ideas why the optimization had no success?

What I understand from the description of your process is: <ol> <li>You have a table which has many rows that needs to be processed.</li> <li>You select a row from that table (using for update) so that other threads cannot get access to the same row. </li> <li>When you are done you update the row and commit the transaction.</li> <li>And then delete the row from the database.</li> </ol> If this is the case then you are doing the right thing as this will have less locks then the second approach you mentioned. You can decrease the lock contention further by removing the delete statement as this will lock the whole table. Rather than doing that add a flag (new column named processed) and update that. And delete the rows at the end when all the threads are done processing. You can also make the work distribution intelligent by batching the work load - in your case the row range (may be using PK) which each thread is going to process - in that case you can do a simple select and no need for the FOR UPDATE clause and it will work fast.

SELECT FOR UPDATE vs. UPDATE, then SELECT

Tags:

performance

mysql

innodb

I've created a service application that uses multi-threading for parallel processing of data located in an InnoDB table (about 2-3 millions of records, and no more InnoDB-related queries performed by the application). Each thread makes the following queries to the mentioned table:

START TRANSACTION
SELECT FOR UPDATE (SELECT pk FROM table WHERE status='new' LIMIT 100 FOR UPDATE)
UPDATE (UPDATE table SET status='locked' WHERE pk BETWEEN X AND Y)
COMMIT
DELETE (DELETE FROM table WHERE pk BETWEEN X AND Y)

The guys from forum.percona.com gave me a piece of advice - do not use SELECT FOR UPDATE and UPDATE because of longer time needed for transaction to execute (2 queries), and waiting lock timeouts that result. Their advice was (autocommit is on):

UPDATE (UPDATE table SET status='locked', thread = Z LIMIT 100)
SELECT (SELECT pk FROM table WHERE thread = Z)
DELETE (DELETE FROM table WHERE pk BETWEEN X AND Y)

and it was supposed to improve performance. However, instead, I got even more deadlocks and wait lock timeouts than before...

I read a lot about optimizing InnoDB, and tuned the server correspondlingly, so my InnoDB settings are 99% ok. This fact is also proven by the first scenario working fine and better than second one. The my.cnf file:

innodb_buffer_pool_size = 512M
innodb_thread_concurrency = 16
innodb_thread_sleep_delay = 0
innodb_log_buffer_size = 4M
innodb_flush_log_at_trx_commit=2

Any ideas why the optimization had no success?

822

asked Feb 16 '11 08:02

Alex

1 Answers

What I understand from the description of your process is:

You have a table which has many rows that needs to be processed.
You select a row from that table (using for update) so that other threads cannot get access to the same row.
When you are done you update the row and commit the transaction.
And then delete the row from the database.

If this is the case then you are doing the right thing as this will have less locks then the second approach you mentioned.

You can decrease the lock contention further by removing the delete statement as this will lock the whole table. Rather than doing that add a flag (new column named processed) and update that. And delete the rows at the end when all the threads are done processing.

You can also make the work distribution intelligent by batching the work load - in your case the row range (may be using PK) which each thread is going to process - in that case you can do a simple select and no need for the FOR UPDATE clause and it will work fast.

answered Oct 05 '22 11:10

Faisal Feroz

Related questions
                            
                                Microsoft Visual Studio 2017 Mac: Where is server explorer?
                            
                                MySQL Data Provider Not Showing in Entity Data Model Wizard
                            
                                Clone Database into Different Name in Laravel
                            
                                SQL subtracting results of 2 select statement
                            
                                Magento CE 2.2 Huge Catalog Indexing Issue
                            
                                Python SQL connection error (2006, 'SSL connection error: SSL_CTX_set_tmp_dh failed')
                            
                                Electron + MySQL throws security warning
                            
                                MySQL named pipes on Windows--faster best practice, or bad idea?
                            
                                Handling of huge Blobs in MySQL?
                            
                                Mysql trigger/events vs Cronjob
                            
                                search tags like stackoverflow? [duplicate]
                            
                                Does a MySQL, PHP, JSON framework exist?
                            
                                Ajax Pagination with Jquery, PHP, Mysql [closed]
                            
                                insert if not exists else just select in mysql
                            
                                MySQL slow queries
                            
                                Why does this MySQL Query hang?
                            
                                Find Consecutive Rows & Calculate Duration
                            
                                transaction isolation level good explanation [closed]
                            
                                subtracting timestamps
                            
                                How to get and play WAV file stored as MySQL BLOB?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With