Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multithreading for loop while maintaining order

I started messing around with multithreading for a CPU intensive batch process I'm running. Essentially I'm trying to condense multiple single page tiffs into single PDF documents. This works fine with a foreach loop or standard iteration but can be very slow for several 100 page documents. I tried the following based on a some examples I found to use multithreading and it has significant performance improvements however it obliterates the page order instead of 1,2,3,4 it will be 1,3,4,2,6,5 on what thread completes first.

My question is how would I utilize this technique while maintaining the page order and if I can will it negate the performance benefit of the multithreading? Thank you in advance.

PdfDocument doc = new PdfDocument();
string mail = textBox1.Text;
string[] split = mail.Split(new string[] { Environment.NewLine }, StringSplitOptions.None);

int counter = split.Count();

// Source must be array or IList.
var source = Enumerable.Range(0, 100000).ToArray();
// Partition the entire source array.
var rangePartitioner = Partitioner.Create(0, counter);
double[] results = new double[counter];
// Loop over the partitions in parallel.
Parallel.ForEach(rangePartitioner, (range, loopState) =>
{
    // Loop over each range element without a delegate invocation.
    for (int i = range.Item1; i < range.Item2; i++)
    {
        f_prime = split[i].Replace(" " , "");
        PdfPage page = doc.AddPage();
        XGraphics gfx = XGraphics.FromPdfPage(page);
        XImage image = XImage.FromFile(f_prime);
        double x = 0;
        gfx.DrawImage(image, x, 0);

    }
});
like image 833
David Avatar asked Dec 28 '10 01:12

David


3 Answers

I would just use the overload of Parallel.ForEach that returns the element index:

 Parallel.ForEach(rangePartitioner, (range, loopState, elementIndex) =>

then in your loop you can fill an array with the result of your work and go through the results in order once they have all completed.

like image 137
BrokenGlass Avatar answered Sep 18 '22 12:09

BrokenGlass


I'm not sure the other solutions will work exactly the way he wants. The reasoning for this is that PdfPage page = doc.AddPage(); creates and adds a new page at the same time, thus it will always be out of order since the order is dictated first come first serve through doc

If AddPage is a fast operation, you can create all 100 pages at once, without any processing. Then go back through and render the Tiff images into the page.

PdfDocument doc = new PdfDocument();
string mail = textBox1.Text;
string[] split = mail.Split(new string[] { Environment.NewLine }, StringSplitOptions.None);

int counter = split.Count();

// Source must be array or IList.
var source = Enumerable.Range(0, 100000).ToArray();
// Partition the entire source array.
var rangePartitioner = Partitioner.Create(0, counter);

double[] results = new double[counter];

PdfPage[] pages = new PdfPage[counter];
for (int i = 0; i < counter; ++i) 
{
    pages[i] = doc.AddPage();
}

// Loop over the partitions in parallel.
Parallel.ForEach(rangePartitioner, (range, loopState) =>
{
    // Loop over each range element without a delegate invocation.
    for (int i = range.Item1; i < range.Item2; i++)
    {
        f_prime = split[i].Replace(" " , "");
        PdfPage page = pages[i];
        XGraphics gfx = XGraphics.FromPdfPage(page);
        XImage image = XImage.FromFile(f_prime);
        double x = 0;
        gfx.DrawImage(image, x, 0);
    }
});

Edit

I think there is a more elegant solution but without knowing the Properties of PdfPage I didn't want to offer it before. If you can tell which page a PfdPage belongs to you can make things very simple like so:

PdfDocument doc = new PdfDocument();
string mail = textBox1.Text;
string[] split = mail.Split(new string[] { Environment.NewLine }, StringSplitOptions.None);

int counter = split.Count();

// Source must be array or IList.
var source = Enumerable.Range(0, 100000).ToArray();
// Partition the entire source array.
var rangePartitioner = Partitioner.Create(0, counter);

double[] results = new double[counter];

// Loop over the partitions in parallel.
Parallel.ForEach(rangePartitioner, (range, loopState) =>
{
    // Loop over each range element without a delegate invocation.
    for (int i = range.Item1; i < range.Item2; i++)
    {
        PdfPage page = doc.AddPage();
        // Only use i as a loop not as the index
        int pageIndex = page.PageIndex; // This is what I don't know
        f_prime = split[pageIndex].Replace(" " , "");
        XGraphics gfx = XGraphics.FromPdfPage(page);
        XImage image = XImage.FromFile(f_prime);
        double x = 0;
        gfx.DrawImage(image, x, 0);
    }
});
like image 21
Andrew T Finnell Avatar answered Sep 20 '22 12:09

Andrew T Finnell


Use .AsParallel().AsOrdered(), as described in this document: http://msdn.microsoft.com/en-us/library/dd460677.aspx

I think it would look something like this:

rangePartitioner.AsParallel().AsOrdered().ForAll(
    range => 
    {
        // Loop over each range element without a delegate invocation.
        ...
    });
like image 39
StriplingWarrior Avatar answered Sep 22 '22 12:09

StriplingWarrior