Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practices for multithreaded processing of database records

I have a single process that queries a table for records where PROCESS_IND = 'N', does some processing, and then updates the PROCESS_IND to 'Y'.

I'd like to allow for multiple instances of this process to run, but don't know what the best practices are for avoiding concurrency problems.

Where should I start?

like image 504
Mike Sickler Avatar asked Feb 18 '09 20:02

Mike Sickler


People also ask

Are databases multi threaded?

All Db2 database system applications are multithreaded by default, and are capable of using multiple contexts. You can use the following Db2 APIs to use multiple contexts.

Is SQL single threaded or multithreaded?

SQL servers are designed to handle multiple connections but every entry in to the transaction log has to be handled sequentially. In short, many people can on multiple threads be connected to the server, but only 1 transaction can occur at any given point in time.

What is multithreaded processing?

Multithreading is a model of program execution that allows for multiple threads to be created within a process, executing independently but concurrently sharing process resources. Depending on the hardware, threads can run fully parallel if they are distributed to their own CPU core.

Does SQL support multithreading?

In addition to running user queries on multiple processors SQL Server can also use multiple threads to build indexes.


1 Answers

The pattern I'd use is as follows:

  • Create columns "lockedby" and "locktime" which are a thread/process/machine ID and timestamp respectively (you'll need the machine ID when you split the processing between several machines)
  • Each task would do a query such as:

    UPDATE taskstable SET lockedby=(my id), locktime=now() WHERE lockedby IS NULL ORDER BY ID LIMIT 10

Where 10 is the "batch size".

  • Then each task does a SELECT to find out which rows it has "locked" for processing, and processes those
  • After each row is complete, you set lockedby and locktime back to NULL
  • All this is done in a loop for as many batches as exist.
  • A cron job or scheduled task, periodically resets the "lockedby" of any row whose locktime is too long ago, as they were presumably done by a task which has hung or crashed. Someone else will then pick them up

The LIMIT 10 is MySQL specific but other databases have equivalents. The ORDER BY is import to avoid the query being nondeterministic.

like image 91
MarkR Avatar answered Oct 04 '22 07:10

MarkR