I would like to know if we should throttle async tasks if the number of tasks to complete is big. Say you have 1000 URLs, do you fire all the requests at once and wait for all:
var tasks = urlList.Select(url => downloadAsync(url));
await Task.WhenAll(tasks);
Or do you batch the requests and process one batch after another:
foreach (var urlBatch in urlList.BatchEnumerable(BatchSize)){
var tasks = urlBatch.Select(url => downloadAsync(url));
await Task.WhenAll(tasks);
}
I thought that batching is not necessary, because the first approach (firing all requests at once) will create tasks that are scheduled by the ThreadPool
, so we should let the ThreadPool
decide when to execute each task. However, I was told that in practice that only works if the tasks are compute tasks. When the the tasks involve network requests, the first approach could cause the host machine to hang ??? Why is that ?
You want to limit yourself to something in most cases. You always have some state kept somewhere when you have multiple operations running concurrently. If they are CPU bound then tasks are stored in the ThreadPool
queue waiting for a thread and if it's async then you have the state machine sitting on the heap.
Even async operations usually use up some limited resource, be it bandwith, ports, remote DB server's CPU, etc.
You don't have to limit yourself to a single batch at a time though (as you need to wait for the last operation to complete instead of starting others). You can throttle using a SlimSemahpore
or even better, a TPL Dataflow block:
var block = new ActionBlock<string>(
url => downloadAsync(url),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 10 });
urlList.ForEach(url => block.Post(url));
block.Complete();
await block.Completion;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With