Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why isn't this Parallel.ForEach loop improving performance?

I have the following code:

           if (!this.writeDataStore.Exists(mat))
            {
                BlockingCollection<ImageFile> imageFiles = new BlockingCollection<ImageFile>();
                Parallel.ForEach(fileGrouping, fi => DecompressAndReadGzFile(fi, imageFiles));


                this.PushIntoDb(mat, imageFiles.ToList());
            }

DecompressAndReadGzFile is a static method in the same class that this method is contained in. As per the method name I am decompressing and reading gz files, lots of them, i.e. up to 1000, so the overhead of parallelisation is worth it for the benefits. However, I'm not seeing the benefits. When I use ANTS performance profiler I see that they are running at exactly the same times as if no parallelisation is occuring. I also check the CPU cores with process explorer and it looks like there is possibly work being done on two cores but one core seems to be doing most of the work. What am I not understanding as far as getting Parallel.ForEach to decompress and read files in parallel?

UPDATED QUESTION: What is the fastest way to read information in from a list of files?

The Problem (simplified):

  1. There is a large list of .gz files (1200).
  2. Each file has a line containing "DATA: ", the location and line number are not static and can vary from file to file.
  3. We need to retrieve the first number after "DATA: " (just for simplicity's sake) and store it in an object in memory (e.g. a List)

In the initial question, I was using the Parallel.ForEach loop but I didn't seem to be CPU bound on more than 1 core.

like image 390
Seth Avatar asked Nov 10 '11 07:11

Seth


People also ask

Does parallel ForEach improve performance?

In many cases, Parallel. For and Parallel. ForEach can provide significant performance improvements over ordinary sequential loops. However, the work of parallelizing the loop introduces complexity that can lead to problems that, in sequential code, are not as common or are not encountered at all.

Which is faster parallel ForEach or ForEach?

parallel foreach() This is way faster that foreach() and stream.

Is parallel ForEach blocking?

No, it doesn't block and returns control immediately. The items to run in parallel are done on background threads.

Should you use parallel ForEach?

The short answer is no, you should not just use Parallel. ForEach or related constructs on each loop that you can. Parallel has some overhead, which is not justified in loops with few, fast iterations.


1 Answers

Is it possible that the threads are spending most of their time waiting for IO? By reading multiple files at a time, you may be making the disk thrash more than it would with a single operation. It's possible that you could improve performance by using a single thread reading sequentially, but then doling out the CPU-bound decompression to separate threads... but you may actually find that you only really need one thread performing the decompression anyway, if the disk is slower than the decompression process itself.

One way to test this would be to copy the files requiring decompression onto a ramdisk first and still use your current code. I suspect you'll then find you're CPU-bound, and all the processors are busy almost all the time.

(You should also consider what you're doing with the decompressed files. Are you writing those back to disk? If so, again there's the possibility that you're basically waiting for a thrashing disk.)

like image 185
Jon Skeet Avatar answered Sep 30 '22 10:09

Jon Skeet