Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ITextSharp Out of memory exception merging multiple pdf

I have to merge multiple 1 page pdf's into one pdf. I'm using iTextSHarp 5.5.5.0 to accomplish this, but when I get to merge more than 900-1000 pdf I get an out of memory exception. I noticed that even if I free my reader and close it the memory never gets cleaned properly (the amount of memory used by the process never decreases)so I was wondering what I could possibly be doing wrong. This is my code:

 using (MemoryStream msOutput = new MemoryStream())
        {
            Document doc = new Document();
            PdfSmartCopy pCopy = new PdfSmartCopy(doc, msOutput);
            doc.Open();
            foreach (Tuple<string, int> file in filesList)
            {
                PdfReader pdfFile = new PdfReader(file.Item1);
                for (int j = 0; j < file.Item2; j++)
                    for (int i = 1; i < pdfFile.NumberOfPages + 1; i++)//in this case it's always 1. 
                        pCopy.AddPage(pCopy.GetImportedPage(pdfFile, i));
                pCopy.FreeReader(pdfFile);
                pdfFile.Close();
                File.Delete(file.Item1);
            }
            pCopy.Close();
            doc.Close();

            byte[] content = msOutput.ToArray();
            using (FileStream fs = File.Create(Out))
            {
                fs.Write(content, 0, content.Length);
            }
        }

It never gets to writing the file, I get an out of memory exception during the p.Copy().AddPage() part. I even tried flushing the pCopy variable but didn't change anything. I looked in the documentation of iText and various questions around StackOverflow but seems to me that I'm taking every suggestion to keep memory usage low, but this isn't happening. Any ideas on this?

like image 764
Daniele Sassoli Avatar asked Apr 02 '15 14:04

Daniele Sassoli


1 Answers

Since this is a large amount of stuff I'd recommend writing directly to a FileStream instead of a MemoryStream. This might be an actual case where an Out of Memory Exception might literally mean "Out of Memory".

Also, as Bruno pointed out, the "smart" part of PdfSmartCopy unfortunately comes at the cost of memory, too. Switching to PdfCopy should reduce memory pressure although your final PDF might be larger.

like image 155
Chris Haas Avatar answered Nov 15 '22 05:11

Chris Haas