I have a database table that contains some records to be processed. The table has a flag column that represents the following status values. 1 - ready to be processed, 2- successfully processed, 3- processing failed.
The .net code (repeating process - console/service) will grab a list of records that are ready to be processed, and loop through them and attempt to process them (Not very lengthy), update status based on success or failure.
To have better performance, I want to enable multithreading for this process. I'm thinking to spawn say 6 threads, each threads grabbing a subset.
Obviously I want to avoid having different threads process the same records. I dont want to have a "Being processed" flag in the database to handle the case where the thread crashes leaving the record hanging.
The only way I see doing this is to grab the complete list of available records and assigning a group (maybe ids) to each thread. If an individual thread fails, its unprocessed records will be picked up next time the process runs.
Is there any other alternatives to dividing the groups prior to assigning them to threads?
The most straightforward way to implement this requirement is to use the Task Parallel Library's
Parallel.ForEach (or Parallel.For).
Allow it to manage individual worker threads.
From experience, I would recommend the following:
Alternatively
Consider using a transactional queue (MSMQ or Rabbit MQ come to mind). They are optimized for this very problem.
That would be my clear choice, having done both at massive scale.
Optimizing
If it takes a non-trivial amount of time to retrieve data from the database, you can consider a Producer/Consumer pattern, which is quite straightforward to implement with a BlockingCollection. That pattern allows one thread (producer) to populate a queue with DB records to be processed, and multiple other threads (consumers) to process items off of that queue.
A New Alternative
Given that several processing steps touch the record before it is considered complete, have a look at Windows Workflow Foundation as a possible alternative.
I remember doing something like what you described...A thread checks from time to time if there is something new in database that needs to be processed. It will load only the new ids, so if at time x last id read is 1000, at x+1 will read from id 1001.
Everything it reads goes into a thread safe Queue. When items are added to this queue, you notify the working threads (maybe use autoreset events, or spawn threads here). each thread will read from this thread safe queue one item at a time, until the queue is emptied.
You should not assign before the work foreach thread (unless you know that foreach file the process takes the same amount of time). if a thread finishes the work, then it should take the load from the other ones left. using this thread safe queue, you make sure of this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With