I recently discovered the following code below to effectively run lots of I/O bound tasks:
Implementing a simple ForEachAsync, part 2
I'm under the impression the following are true:
Parallel.ForEach
because the work is not CPU bound.ForEachAsync
will help in queueing as many IO tasks as possible (without necessarily putting these on separate threads).My question is, as Parallel.ForEach
intrinsically has its own MaxDegreeOfParallelism
defined how do I know what to define the dop parameter to here in the example code of the IEnumerable
extension?
e.g. If I have 1000 items to process and need to carry out an IO based SQL-Server db call for each item, would I specify 1000 as the dop? With Parallel.ForEach
it is used as a limiter to prevent too many threads spinning up which might hurt performance. But here it seems to be used to partition up the minimum number of async tasks. I'm thinking there should be at least no maximum as such (the minimum being the total items to process) because I want to queue as many IO based calls to the database as possible.
How do I go about knowing what to see the DOP parameter too?
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate {
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
Parallel.ForEach intrinsically has its own MaxDegreeOfParallelism
OK, the heuristics built into Parallel.ForEach
are very prone to spawn huge numbers of tasks over time (if your work items have a 10ms delay you get hundreds of tasks after an hour or so - I measured it). Really terrible design flaw, don't try to emulate this.
When running IO in parallel there is no substitute for empirically determining the right value. That's why the TPL is so bad at it. For example a magnetic disks doing sequential IO likes a DOP of 1. An SSD doing random likes basically infinite (100?).
A remote web-service gives you no way of knowing the right DOP. Not only do you need to test, you need to ask the owner for permission to spam the service with requests which might overload it.
would I specify 1000 as the dop?
Then you would not need this facility at all. Just spawn all tasks, then wait for all of them. But 1000 is likely the wrong DOP because it overwhelms the DB for no benefit.
here it seems to be used to partition up the minimum number of async tasks
Another terrible feature of Parallel.For
. On low CPU machines it might spawn to little tasks. Horrible API. Do not use it with IO. (I use AsParallel
which allows you to set an exact DOP, not a max DOP.)
because I want to queue as many IO based calls to the database as possible
Why is that? Not a good plan.
Btw, the method that you posted here is good and I use this as well. I wish it was in the framework. This exact method is the answer to about 10 SO questions per week ("How can I asynchronously process 100000 items in parallel?").
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With