Why isn't this Parallel.ForEach loop improving performance?

Tags:

I have the following code:

           if (!this.writeDataStore.Exists(mat))
            {
                BlockingCollection<ImageFile> imageFiles = new BlockingCollection<ImageFile>();
                Parallel.ForEach(fileGrouping, fi => DecompressAndReadGzFile(fi, imageFiles));


                this.PushIntoDb(mat, imageFiles.ToList());
            }

DecompressAndReadGzFile is a static method in the same class that this method is contained in. As per the method name I am decompressing and reading gz files, lots of them, i.e. up to 1000, so the overhead of parallelisation is worth it for the benefits. However, I'm not seeing the benefits. When I use ANTS performance profiler I see that they are running at exactly the same times as if no parallelisation is occuring. I also check the CPU cores with process explorer and it looks like there is possibly work being done on two cores but one core seems to be doing most of the work. What am I not understanding as far as getting Parallel.ForEach to decompress and read files in parallel?

UPDATED QUESTION: What is the fastest way to read information in from a list of files?

The Problem (simplified):

There is a large list of .gz files (1200).
Each file has a line containing "DATA: ", the location and line number are not static and can vary from file to file.
We need to retrieve the first number after "DATA: " (just for simplicity's sake) and store it in an object in memory (e.g. a List)

In the initial question, I was using the Parallel.ForEach loop but I didn't seem to be CPU bound on more than 1 core.

390

asked Nov 10 '11 07:11

Seth

1 Answers

Is it possible that the threads are spending most of their time waiting for IO? By reading multiple files at a time, you may be making the disk thrash more than it would with a single operation. It's possible that you could improve performance by using a single thread reading sequentially, but then doling out the CPU-bound decompression to separate threads... but you may actually find that you only really need one thread performing the decompression anyway, if the disk is slower than the decompression process itself.

One way to test this would be to copy the files requiring decompression onto a ramdisk first and still use your current code. I suspect you'll then find you're CPU-bound, and all the processors are busy almost all the time.

(You should also consider what you're doing with the decompressed files. Are you writing those back to disk? If so, again there's the possibility that you're basically waiting for a thrashing disk.)

185

answered Sep 30 '22 10:09

Jon Skeet

Related questions
                            
                                How are Windows GUI control ids created?
                            
                                CallbackOnCollectedDelegate was detected
                            
                                ConverterParameter -- Any way to pass in some delimited list?
                            
                                How do I recreate an Excel formula which calls TREND() in C#?
                            
                                How to know the number of Files opened by OpenFileDialog
                            
                                Looking for regex to split on the string on upper case basis
                            
                                How do I decompose a Predicate Expression into a query?
                            
                                Setting height for table in iTextSharp
                            
                                How to test HTTP status code set by an ASP.NET MVC action with MSpec
                            
                                C# LDAP query to retrieve all users in an organisational unit
                            
                                Binding sub Objects to DataField and DataSource
                            
                                Why does this runtime dynamic binding fail?
                            
                                C# - ESENT db memory leaks?
                            
                                Convert varbinary back to .txt file
                            
                                Is it possible to set FolderBrowserDialog.RootFolder to an arbitrary path from a string?
                            
                                Specifying "any subclass" in a C# type constraint rather than "one particular subclass"
                            
                                why is string IndexOf() acting case-INsensitive?
                            
                                Where to put try/catch when using IDisposable
                            
                                C# Object oriented return type "this" on child call
                            
                                Can I use DynamicParameters with Template and have a return parameter in dapper?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why isn't this Parallel.ForEach loop improving performance?

Tags:

c#

.net

multithreading

Seth

People also ask

1 Answers

Jon Skeet

Recent Activity

Donate For Us