How to correctly write to a file using Parallel.ForEach?

Tags:

I have a task which reads a large file line by line, does some logic with it, and returns a string I need to write to a file. The order of the output does not matter. However, when I try the code below, it stops/get really slow after reading 15-20k lines of my file.

public static Object FileLock = new Object();
...
Parallel.ForEach(System.IO.File.ReadLines(inputFile), (line, _, lineNumber) =>
{
    var output = MyComplexMethodReturnsAString(line);
    lock (FileLock)
    {
        using (var file = System.IO.File.AppendText(outputFile))
        {
            file.WriteLine(output);
        }
    }
});

Why is my program slow down after some time running? Is there a more correct way to perform this task?

690

asked Feb 12 '16 22:02

justindao

2 Answers

You've essentially serialized your query by having all threads try to write to the file. Instead, you should calculate what needs to be written then write them as they come at the end.

var processedLines = File.ReadLines(inputFile).AsParallel()
    .Select(l => MyComplexMethodReturnsAString(l));
File.AppendAllLines(outputFile, processedLines);

If you need to flush the data as it comes, open a stream and enable auto flushing (or flush manually):

var processedLines = File.ReadLines(inputFile).AsParallel()
    .Select(l => MyComplexMethodReturnsAString(l));
using (var output = File.AppendText(outputFile))
{
    output.AutoFlush = true;
    foreach (var processedLine in processedLines)
        output.WriteLine(processedLine);
}

176

answered Oct 21 '22 19:10

Jeff Mercado

This has to do with how Parallel.ForEach's internal load balancer works. When it sees that your threads spend a lot of time blocking, it reasons that it can speed things up by throwing more threads at the problem, leading to higher parallel overheads, contention for your FileLock and overall performance degradation.

Why is this happening? Because Parallel.ForEach is not meant for IO work.

How can you fix this? Use Parallel.ForEach for CPU work only and perform all IO outside of the parallel loop.

A quick workaround is to limit the number of threads Parallel.ForEach is allowed to enlist, by using the overload which accepts ParallelOptions, like so:

Parallel.ForEach(
    System.IO.File.ReadLines(inputFile),
    new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
    (line, _, lineNumber) =>
    {
        ...
    }

answered Oct 21 '22 19:10

Kirill Shlenskiy

Related questions
                            
                                AllowAnonymous not Working MVC5
                            
                                Creating a DateTime using AutoMapper
                            
                                Unity - need to return value only after coroutine finishes
                            
                                Advantage of using CustomAttributes vs GetCustomAttributes()
                            
                                Where to put software version for WPF Application?
                            
                                C# implementation of Heap's algorithm doesn't work
                            
                                1GB of Data From MySQL to MS Access
                            
                                Custom conversions when writing CSV files using CsvHelper
                            
                                How to get bool value from ViewBag on view?
                            
                                What's the recommended folder structure of catalogs in project using IoC
                            
                                Authorization has been denied for this request. Always
                            
                                json deserialize from legacy property names
                            
                                Is the value of a C# decimal stored on the heap even when it is a local variable?
                            
                                Unable to cast object of type 'Newtonsoft.Json.Linq.JArray' to type 'System.Collections.Generic.List`
                            
                                Can I find the number of digits of a BigInteger in C#?
                            
                                C# switch with types
                            
                                Convert an int to an ascii char c# [duplicate]
                            
                                Cast a IQueryable type to interface in Linq to Entities
                            
                                how to convert mat to image
                            
                                How does one use Elmah in ASP.NET 5/vNext/Core?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to correctly write to a file using Parallel.ForEach?

Tags:

c#

file-writing

parallel.foreach

justindao

People also ask

2 Answers

Jeff Mercado

Kirill Shlenskiy

Recent Activity

Donate For Us