I'm trying to create a directory and copy a file (pdf) inside a Parallel.ForEach.
Below is a simple example:
    private static void CreateFolderAndCopyFile(int index)
    {
        const string sourcePdfPath = "c:\\testdata\\test.pdf";
        const string rootPath = "c:\\testdata";
        string folderDirName = string.Format("Data{0}", string.Format("{0:00000000}", index));
        string folderDirPath = rootPath + @"\" + folderDirName;
        Directory.CreateDirectory(folderDirPath);
        string desPdfPath = folderDirPath + @"\" + "test.pdf";
        File.Copy(sourcePdfPath, desPdfPath, true);
    }
The method above creates a new folder and copies the pdf file to a new folder. It creates this dir tree:
TESTDATA
  -Data00000000
      -test.pdf
  -Data00000001
      -test.pdf
....
  -Data0000000N
      -test.pdf
I tried calling the CreateFolderAndCopyFile method in a Parallel.ForEach loop.
    private static void Func<T>(IEnumerable<T> docs)
    {
        int index = 0;
        Parallel.ForEach(docs, doc =>
                                   {
                                       CreateFolderAndCopyFile(index);
                                       index++;
                                   });
    }
When I run this code it finishes with the following error:
The process cannot access the file 'c:\testdata\Data00001102\test.pdf' because it is being used by another process.
But first it created 1111 new folders and copied test.pdf about 1111 times before I got this error.
What caused this behaviour and how can it be resolved?
EDITED:
Code above was toy sample, sorry for hard coded strings Conclusion: Parallel method is slow.
Tomorrow I try some methods from How to write super-fast file-streaming code in C#?.
especially: http://designingefficientsoftware.wordpress.com/2011/03/03/efficient-file-io-from-csharp/
Your increment operation on index is suspect in that it is not thread safe. If you change the operation to Console.WriteLine("{0}", index++) you will see this behavior.
Instead you could use a Parallel.ForEach overload with a loop index:
private static void Func<T>(IEnumerable<T> docs)
{
    // nb: index is 'long' not 'int'
    Parallel.ForEach(docs, (doc, state, index) =>
                            {
                                CreateFolderAndCopyFile(index);
                            });
}
                        You are not synchronizing access to index and that means you have a race on it. That's why you have the error. For illustrative purposes, you can avoid the race and keep this particular design by using Interlocked.Increment.
private static void Func<T>(IEnumerable<T> docs)
{
    int index = -1;
    Parallel.ForEach(
        docs, doc =>
        {
            int nextIndex = Interlocked.Increment(index);
            CreateFolderAndCopyFile(nextIndex);
        }
    );
}
However, as others suggest, the alternative overload of ForEach that provides a loop index is clearly a cleaner solution to this particular problem.
But when you get it working you will find that copying files is IO bound rather than processor bound and I predict that the parallel code will be slower than the serial code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With