Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Throttle async tasks?

I would like to know if we should throttle async tasks if the number of tasks to complete is big. Say you have 1000 URLs, do you fire all the requests at once and wait for all:

var tasks = urlList.Select(url => downloadAsync(url));
await Task.WhenAll(tasks);

Or do you batch the requests and process one batch after another:

foreach (var urlBatch in urlList.BatchEnumerable(BatchSize)){
    var tasks = urlBatch.Select(url => downloadAsync(url));
    await Task.WhenAll(tasks);
}

I thought that batching is not necessary, because the first approach (firing all requests at once) will create tasks that are scheduled by the ThreadPool, so we should let the ThreadPool decide when to execute each task. However, I was told that in practice that only works if the tasks are compute tasks. When the the tasks involve network requests, the first approach could cause the host machine to hang ??? Why is that ?

like image 948
Tuan Nguyen Avatar asked Jan 26 '16 20:01

Tuan Nguyen


1 Answers

You want to limit yourself to something in most cases. You always have some state kept somewhere when you have multiple operations running concurrently. If they are CPU bound then tasks are stored in the ThreadPool queue waiting for a thread and if it's async then you have the state machine sitting on the heap.

Even async operations usually use up some limited resource, be it bandwith, ports, remote DB server's CPU, etc.

You don't have to limit yourself to a single batch at a time though (as you need to wait for the last operation to complete instead of starting others). You can throttle using a SlimSemahpore or even better, a TPL Dataflow block:

var block = new ActionBlock<string>(
   url => downloadAsync(url),
   new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 10 });    

urlList.ForEach(url => block.Post(url));

block.Complete();
await block.Completion;
like image 178
i3arnon Avatar answered Oct 04 '22 19:10

i3arnon