Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to manage thread pool against database queue

I have a data table full of summary entries and my software needs to go through and reach out to a web service to get details, then record those details back to the database. Looping through the table synchronously while calling the web service and waiting for the response is too slow (there are thousands of entries) so I'd like to take the results (10 or so at a time) and thread it out so it performs 10 operations at the same time.

My experience with C# threads is limited to say the least, so what's the best approach? Does .NET have some sort of threadsafe queue system that I can use to make sure that the results get handled properly and in order?

like image 444
Mitchell V Avatar asked Nov 17 '11 00:11

Mitchell V


People also ask

What is a ThreadPool is it better than using several simple threads?

A thread pool is a collection of threads which are assigned to perform uniformed tasks. The advantages of using thread pool pattern is that you can define how many threads is allowed to execute simultaneously.

What happens when thread pool is full?

Since active threads consume system resources, a JVM creating too many threads at the same time can cause the system to run out of memory.

Why a thread pool should not be too big or too small?

The thread pool creates and destroys worker threads in order to optimize throughput, which is defined as the number of tasks that complete per unit of time. Too few threads might not make optimal use of available resources, whereas too many threads could increase resource contention.


1 Answers

Depending on which version of the .NET Framework you have two pretty good options.

You can use ThreadPool.QueueUserWorkItem in any version.

int pending = table.Rows.Count;
var finished = new ManualResetEvent(false);
foreach (DataRow row in table.Rows)
{
  DataRow capture = row; // Required to close over the loop variable correctly.
  ThreadPool.QueueUserWorkItem(
    (state) =>
    {
      try
      {
        ProcessDataRow(capture);
      }
      finally
      {
         if (Interlocked.Decrement(ref pending) == 0) 
         {
           finished.Set();  // Signal completion of all work items.
         }
      }
    }, null);
}
finished.WaitOne(); // Wait for all work items to complete.

If you are using .NET Framework 4.0 you can use the Task Parallel Library.

var tasks = new List<Task>();
foreach (DataRow row in table.Rows)
{
  DataRow capture = row; // Required to close over the loop variable correctly.
  tasks.Add(
    Task.Factory.StartNew(
      () =>
      {
        ProcessDataRow(capture);        
      }));
}
Task.WaitAll(tasks.ToArray()); // Wait for all work items to complete.

There are many other reasonable ways to do this. I highlight the patterns above because they are easy and work well. In the absence of specific details I cannot say for certain that either will be a perfect match for your situation, but they should be a good starting point.

Update:

I had a short period of subpar cerebral activity. If you have the TPL available you could also use Parallel.ForEach as a simpler method than all of that Task hocus-pocus I mentioned above.

Parallel.ForEach(table.Rows,
  (DataRow row) =>
  {
    ProcessDataRow(row);
  });
like image 164
Brian Gideon Avatar answered Sep 19 '22 22:09

Brian Gideon