Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's a good strategy for processing a queue in parallel?

I'm writing a program which needs to recursively search through a folder structure, and would like to do so in parallel with several threads.

I've written the rather trivial synchronous method already - adding the root directory to the queue initially, then dequeuing a directory, queuing its subdirectories, etc., until the queue is empty. I'll use a ConcurrentQueue<T> for my queue, but have already realized that my loops will stop prematurely. The first thread will dequeue the root directory, and immediately every other thread could see that the queue is empty and exit, leaving the first thread as the only one running. I would like each thread to loop until the queue is empty, then wait until another thread queues some more directories, and keep going. I need some sort of checkpoint in my loop so that none of the threads will exit until every thread has reached the end of the loop, but I'm not sure the best way to do this without deadlocking when there really are no more directories to process.

like image 703
dlras2 Avatar asked Dec 28 '25 16:12

dlras2


1 Answers

Use the Task Parallel Library.

Create a Task to process the first folder. In this create a Task to process each subfolder (recursively) and a task for each relevant file. Then wait on all the tasks for this folder.

The TPL runtime will make use of the thread pool avoiding creating threads, which is an expensive operation. for small pieces of work.

Note:

  • If the work per file is trivial do it inline rather than creating another task (IO performance will be the limiting factor).
  • This approach will generally work best if blocking operations are avoided, but if IO performance is the limit then this might not matter anyway—start simple and measure.
  • Before .NET 4 much of this can be done with the thread pool, but you'll need to use events to wait for tasks to complete, and that waiting will tie up thread pool threads.1

1 As I understand it, in the TPL when waiting on tasks—using a TPL method—TPL will reuse that thread for other tasks until the wait is fulfilled.

like image 74
Richard Avatar answered Dec 30 '25 06:12

Richard



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!