I have a list of table names (student, exam, school).
I use a Parallel.ForEach
loop to iterate over the table names and do processing for each table, with MaxDegreeOfParallelism = 8
.
My problem is that my Parallel.ForEach
doesn't always engage in work stealing. For example, when two tables are left to process, they may be processed one after another instead of in parallel. I'm trying to improve performance and increase throughput.
I tried to do this by creating a custom TaskScheduler
, however, for my implementation I need a sorted list of tasks with the easiest tasks ordered first, so that they aren't held-up by longer-running tables. I can't seem to do this by sorting the list passed to Parallel.ForEach
(List< string >
) because the tasks are Enqueued by the TaskScheduler
out-of-order. Therefore, I need a way to sort a list of tasks inside my CustomTaskScheduler, which is based on https://psycodedeveloper.wordpress.com/2013/06/28/a-custom-taskscheduler-in-c/
How can I control the order in which tasks are passed by the Parallel.ForEach
to the TaskScheduler
to be enqueued?
The Parallel.ForEach
method employs two different partitioning strategies depending on the type of the source. If the source is an array or a List
, it is partitioned statically (upfront). If the source is an honest-to-goodness¹ IEnumerable
, it is partitioned dynamically (on the go). The dynamic partitioning has the desirable behavior of work-stealing, but has more overhead. In your case the overhead is not important, because the granularity of your workload is very low.
To ensure that the partitioning is dynamic, the easiest way is to wrap your source with the Partitioner.Create
method:
string[] tableNames;
Parallel.ForEach(Partitioner.Create(tableNames), tableName =>
{
// Process table
});
¹ (The expression is borrowed from a comment in the source code)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With