Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

File.Copy in Parallel.ForEach

I'm trying to create a directory and copy a file (pdf) inside a Parallel.ForEach.

Below is a simple example:

    private static void CreateFolderAndCopyFile(int index)
    {
        const string sourcePdfPath = "c:\\testdata\\test.pdf";
        const string rootPath = "c:\\testdata";

        string folderDirName = string.Format("Data{0}", string.Format("{0:00000000}", index));

        string folderDirPath = rootPath + @"\" + folderDirName;

        Directory.CreateDirectory(folderDirPath);

        string desPdfPath = folderDirPath + @"\" + "test.pdf";

        File.Copy(sourcePdfPath, desPdfPath, true);

    }

The method above creates a new folder and copies the pdf file to a new folder. It creates this dir tree:

TESTDATA
  -Data00000000
      -test.pdf
  -Data00000001
      -test.pdf
....
  -Data0000000N
      -test.pdf

I tried calling the CreateFolderAndCopyFile method in a Parallel.ForEach loop.

    private static void Func<T>(IEnumerable<T> docs)
    {
        int index = 0;
        Parallel.ForEach(docs, doc =>
                                   {
                                       CreateFolderAndCopyFile(index);
                                       index++;
                                   });
    }

When I run this code it finishes with the following error:

The process cannot access the file 'c:\testdata\Data00001102\test.pdf' because it is being used by another process.

But first it created 1111 new folders and copied test.pdf about 1111 times before I got this error.

What caused this behaviour and how can it be resolved?

EDITED:

Code above was toy sample, sorry for hard coded strings Conclusion: Parallel method is slow.

Tomorrow I try some methods from How to write super-fast file-streaming code in C#?.

especially: http://designingefficientsoftware.wordpress.com/2011/03/03/efficient-file-io-from-csharp/

like image 372
Mike Avatar asked Mar 28 '12 18:03

Mike


2 Answers

Your increment operation on index is suspect in that it is not thread safe. If you change the operation to Console.WriteLine("{0}", index++) you will see this behavior.

Instead you could use a Parallel.ForEach overload with a loop index:

private static void Func<T>(IEnumerable<T> docs)
{
    // nb: index is 'long' not 'int'
    Parallel.ForEach(docs, (doc, state, index) =>
                            {
                                CreateFolderAndCopyFile(index);
                            });
}
like image 160
user7116 Avatar answered Nov 08 '22 04:11

user7116


You are not synchronizing access to index and that means you have a race on it. That's why you have the error. For illustrative purposes, you can avoid the race and keep this particular design by using Interlocked.Increment.

private static void Func<T>(IEnumerable<T> docs)
{
    int index = -1;
    Parallel.ForEach(
        docs, doc =>
        {
            int nextIndex = Interlocked.Increment(index);
            CreateFolderAndCopyFile(nextIndex);
        }
    );
}

However, as others suggest, the alternative overload of ForEach that provides a loop index is clearly a cleaner solution to this particular problem.

But when you get it working you will find that copying files is IO bound rather than processor bound and I predict that the parallel code will be slower than the serial code.

like image 30
David Heffernan Avatar answered Nov 08 '22 05:11

David Heffernan