Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

WhenAll vs WaitAll in parallel

I'm trying to understand how WaitAll and WhenAll works and have following problem. There are two possible ways to get a result from a method:

  1. return Task.WhenAll(tasks).Result.SelectMany(r=> r);
  2. return tasks.Select(t => t.Result).SelectMany(r => r).ToArray();

If I understand correctly, the second case is like calling WaitAll on tasks and fetching the results after that.

It looks like the second case has much better performance. I know that the proper usage of WhenAll is with await keyword, but still, i'm wondering why there is so big difference in performance for these lines.

After analyzing the flow of the system I think I've figured out how to model the problem in a simple test application (test code is based on I3arnon answer):

    public static void Test()
    {
        var tasks = Enumerable.Range(1, 1000).Select(n => Task.Run(() => Compute(n)));

        var baseTasks = new Task[100];
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < 100; i++)
        {
            baseTasks[i] = Task.Run(() =>
            {
                tasks.Select(t => t.Result).SelectMany(r => r).ToList();
            });

        }
        Task.WaitAll(baseTasks);
        Console.WriteLine("Select - {0}", stopwatch.Elapsed);

        baseTasks = new Task[100];
        stopwatch.Restart();
        for (int i = 0; i < 100; i++)
        {
            baseTasks[i] = Task.Run(() =>
            {
                Task.WhenAll(tasks).Result.SelectMany(result => result).ToList();
            });

        }
        Task.WaitAll(baseTasks);
        Console.WriteLine("Task.WhenAll - {0}", stopwatch.Elapsed);
    }

It looks like the problem is in starting tasks from other tasks (or in Parallel loop). In that case WhenAll results in much worse performance of the program. Why is that?

like image 423
bryl Avatar asked May 12 '26 17:05

bryl


1 Answers

You are starting tasks inside a Parallel.ForEach loop which you should avoid. The whole point of Paralle.ForEach is to parallelize many small but intensive computations across the available CPU cores and starting a task is not an intensive computation. Rather it creates a task object and stores it on a queue if the task pool is saturated which it quickly will be with 1000 tasks being starteed. So now Parallel.ForEach competes with the task pool for compute resources.

In the first loop that is quite slow it seems that the scheduling is suboptimal and very little CPU is used probably because of Task.WhenAll inside the Parallel.ForEach. If you change the Parallel.ForEach to a normal for loop you will see a speedup.

But if you code really is as simple as a Compute function without any state carried forward between iterations you can get rid of the tasks and simply use Parallel.ForEach to maximize performance:

Parallel.For(0, 100, (i, s) =>
{
    Enumerable.Range(1, 1000).Select(n => Compute(n)).SelectMany(r => r).ToList();
});

As to why Task.WhenAll performs much worse you should realize that this code

tasks.Select(t => t.Result).SelectMany(r => r).ToList();

will not run the tasks in parallel. The ToList basically wraps the iteration in a foreach loop and the body of the loop creates a task and then waits for the task to complete because you retrieve the Task.Result property. So each iteration of the loop will create a task and then wait for it to complete. The 1000 tasks are executed one after the other and there is very little overhead in handling the tasks. This means that you do not need the tasks which is also what I have suggested above.

On the other hand, the code

Task.WhenAll(tasks).Result.SelectMany(result => result).ToList();

will start all the tasks and try to execute them concurrently and because the task pool is unable to execute 1000 tasks in parallel most of these tasks are queued before they are executed. This creates a big management and task switch overhead which explains the bad performance.

With regard to the final question you added: If the only purpose of the outer task is to start the inner tasks then the outer task has no useful purpose but if the outer tasks are there to perform some kind of coordination of the inner tasks then it might make sense (perhaps you want to combine Task.WhenAny with Task.WhenAll). Without more context it is hard to answer. However, your question seems to be about performance and starting 100,000 tasks may add considerable overhead.

Parallel.ForEach is a good choice if you want to perform 100,000 independent computations like you do in your example. Tasks are very good for executing concurrent activities involving "slow" calls to other systems where you want to wait for and combine results and also handle errors. For massive parallelism they are probably not the best choice.

like image 119
Martin Liversage Avatar answered May 15 '26 07:05

Martin Liversage