Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Low CPU usage with Parallel.ForEach(...)

I'm attempting to parallelize a task but I'm not getting good performance. At most I'm getting 50% CPU usage.

Below is a toy example where I build a list of lists of random words, and then go through every word and string reverse it. The reversing part is what I'm trying to parallelize.

To me this looks like a CPU-bound problem, so I don't really understand why CPU usage is so low.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;

namespace Papa
{
    class Program
    {
        static void Main(string[] args)
        {
            // Build source
            var random = new Random();
            var alphabet = "abcdefghijklmnopqrstuvwxyz";

            Console.WriteLine("Building data source");
            var source = new List<string[]>();
            foreach (var i in Enumerable.Range(0, 5000000))
            {
                var words = random.Next(3, 50);
                var line = new List<string>();

                for (var j = 0; j < words; ++j)
                {
                    line.Add(
                        string.Concat(
                            Enumerable
                                .Range(0, random.Next(3, 9))
                                .Select(w => alphabet[random.Next(0, alphabet.Length - 1)])
                        )
                    );
                }

                source.Add(line.ToArray());
            }

            // Process source
            Console.WriteLine("Processing source");
            var processed = new List<string[]>();
            Parallel.ForEach(
                source,
                () => new List<string[]>(),
                (line, loop, local) =>
                {
                    var processedLine = new List<string>();
                    for (var i = 0; i < line.Count(); ++i)
                    {
                        processedLine.Add(string.Concat(line[i].Reverse()));
                    }

                    local.Add(processedLine.ToArray());
                    return local;
                },
                partitionResult =>
                {
                    lock (processed)
                    {
                        processed.AddRange(partitionResult);
                    }
                });
        }
    }
}
like image 365
mpium Avatar asked Oct 20 '25 14:10

mpium


1 Answers

Parallel.ForEach will use as many threads as the scheduler provides by default (reference). Be aware that the documentation states:

Executes a foreach (For Each in Visual Basic) operation on a Partitioner in which iterations may run in parallel and loop options can be configured.

So it might be the case it just does not seem fit with the default options.

After running your code I found the following (when hitting the parallelized part):

enter image description here

As you can see it already utilized 15Gb of memory but it's running on all cores. I'm quite sure you run into memory/GC bottlenecks rather than CPU bottlenecks.

I run some additional performance analysis. This is what I found (String::Concat can be ignored because it's part of the initialization):

enter image description here

Line 45 is taking up ~50% of the CPU time.

enter image description here

I played a bit with the code and made the following changes:

Parallel.ForEach(
             source,
             () => new List<string[]>(),
             (line, loop, local) =>
             {
                 var length = line.Length;
                 var processedLine = new string[length];
                 for (var i = 0; i < length; ++i)
                 {
                     processedLine[i] = line[i].Reverse() + "";
                 }

                 local.Add(processedLine);
                 return local;
             },
             partitionResult =>
             {
                 lock (processed)
                 {
                     processed.AddRange(partitionResult);
                 }
             });

With a doubled data-size (10000000 ~32Gb of data) this shows at least a small peak in CPU usage but it is done in less than 7 seconds so I'm not quite sure that TaskManager really gets the peak correctly. Never the less this does also not resolve the problem and the GC also runs like crazy.

My "final" conclusion would be, that it's either (or multiple) of:

  • GC
  • Memory
  • Cache misses

which is throttling you.

In my opinion, this is a nice example of why parallelization will not yield x-times the performance even if you can use x-threads.

like image 69
Alex Avatar answered Oct 23 '25 07:10

Alex