I have a data table full of summary entries and my software needs to go through and reach out to a web service to get details, then record those details back to the database. Looping through the table synchronously while calling the web service and waiting for the response is too slow (there are thousands of entries) so I'd like to take the results (10 or so at a time) and thread it out so it performs 10 operations at the same time.
My experience with C# threads is limited to say the least, so what's the best approach? Does .NET have some sort of threadsafe queue system that I can use to make sure that the results get handled properly and in order?
A thread pool is a collection of threads which are assigned to perform uniformed tasks. The advantages of using thread pool pattern is that you can define how many threads is allowed to execute simultaneously.
Since active threads consume system resources, a JVM creating too many threads at the same time can cause the system to run out of memory.
The thread pool creates and destroys worker threads in order to optimize throughput, which is defined as the number of tasks that complete per unit of time. Too few threads might not make optimal use of available resources, whereas too many threads could increase resource contention.
Depending on which version of the .NET Framework you have two pretty good options.
You can use ThreadPool.QueueUserWorkItem
in any version.
int pending = table.Rows.Count;
var finished = new ManualResetEvent(false);
foreach (DataRow row in table.Rows)
{
DataRow capture = row; // Required to close over the loop variable correctly.
ThreadPool.QueueUserWorkItem(
(state) =>
{
try
{
ProcessDataRow(capture);
}
finally
{
if (Interlocked.Decrement(ref pending) == 0)
{
finished.Set(); // Signal completion of all work items.
}
}
}, null);
}
finished.WaitOne(); // Wait for all work items to complete.
If you are using .NET Framework 4.0 you can use the Task Parallel Library.
var tasks = new List<Task>();
foreach (DataRow row in table.Rows)
{
DataRow capture = row; // Required to close over the loop variable correctly.
tasks.Add(
Task.Factory.StartNew(
() =>
{
ProcessDataRow(capture);
}));
}
Task.WaitAll(tasks.ToArray()); // Wait for all work items to complete.
There are many other reasonable ways to do this. I highlight the patterns above because they are easy and work well. In the absence of specific details I cannot say for certain that either will be a perfect match for your situation, but they should be a good starting point.
Update:
I had a short period of subpar cerebral activity. If you have the TPL available you could also use Parallel.ForEach
as a simpler method than all of that Task
hocus-pocus I mentioned above.
Parallel.ForEach(table.Rows,
(DataRow row) =>
{
ProcessDataRow(row);
});
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With