Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parallelize file writing using TPL?

I am trying to save list of strings to multiple files, each string in a different file, and do it simultaneously. I do it like this:

public async Task SaveToFilesAsync(string path, List<string> list, CancellationToken ct)
{
    int count = 0;
    foreach (var str in list)
    {
        string fullPath = path + @"\" + count.ToString() + "_element.txt";
        using (var sw = File.CreateText(fullPath))
        {
            await sw.WriteLineAsync(str);
        }
        count++;

        NLog.Trace("Saved in thread: {0} to {1}", 
           Environment.CurrentManagedThreadId,
           fullPath);

        if (ct.IsCancellationRequested)
            ct.ThrowIfCancellationRequested();
    }
}

And call it like this:

try
{
   var savingToFilesTask = SaveToFilesAsync(@"D:\Test", myListOfString, ct);
}
catch(OperationCanceledException)
{
   NLog.Info("Operation has been cancelled by user.");
}

But in log file I can clearly see that saving always happen in the same thread id, so no parallelism is going on? What am I doing wrong? How to fix it? My goal is make all saving as fast as possible using all computer cores.

like image 268
Shay Avatar asked Mar 28 '26 15:03

Shay


1 Answers

Essentially, your problem is foreach is synchronous. It uses IEnumerable which is synchronous.

To work around this, first encapsulate the loop body into an asynchronous function.

public async Task WriteToFile(
        string path,
        string str,
        int count)
{
    var fullPath = string.Format("{0}\\{1}_element.txt", path, count);
    using (var sw = File.CreateText(fullPath))
    {
        await sw.WriteLineAsync(str);
    }

    NLog.Trace("Saved in TaskID: {0} to \"{1}\"", 
       Task.CurrentId,
       fullPath);
}

Then, instead of looping synchronously, project the sequence of strings to a sequence of tasks performing your encapsulated loop body. This is not a asynchronous operation in itself but the projection will not block, i.e. there is no await.

Then wait for them all tasks to finish in an order defined by the Task Scheduler.

public async Task SaveToFilesAsync(
        string path,
        IEnumerable<string> list,
        CancellationToken ct)
{
    await Task.WhenAll(list.Select((str, count) => WriteToFile(path, str, count));
}

There is nothing to cancel, so there is no point passing the cancellation token down.

I've used the indexing overload of Select to provide the count value.

I've changed your logging code to use the current Task ID, this avoids any confusion around scheduling.

like image 76
Jodrell Avatar answered Mar 31 '26 05:03

Jodrell



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!