Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

itextsharp trimming pdf document's pages

I have a pdf document that has form fields that I'm filling out programatically with c#. Depending on three conditions, I need to trim (delete) some of the pages from that document.

Is that possible to do?

for condition 1: I need to keep pages 1-4 but delete pages 5 and 6

for condition 2: I need to keep pages 1-4 but delete 5 and keep 6

for condition 3: I need to keep pages 1-5 but delete 6

like image 581
Christopher Johnson Avatar asked Aug 30 '11 15:08

Christopher Johnson


3 Answers

Use PdfReader.SelectPages() combined with PdfStamper. The code below uses iTextSharp 5.5.1.

public void SelectPages(string inputPdf, string pageSelection, string outputPdf)
{
    using (PdfReader reader = new PdfReader(inputPdf))
    {
        reader.SelectPages(pageSelection);

        using (PdfStamper stamper = new PdfStamper(reader, File.Create(outputPdf)))
        {
            stamper.Close();
        }
    }
}

Then you call this method with the correct page selection for each condition.

Condition 1:

SelectPages(inputPdf, "1-4", outputPdf);

Condition 2:

SelectPages(inputPdf, "1-4,6", outputPdf);

or

SelectPages(inputPdf, "1-6,!5", outputPdf);

Condition 3:

SelectPages(inputPdf, "1-5", outputPdf);

Here's the comment from the iTextSharp source code on what makes up a page selection. This is in the SequenceList class which is used to process a page selection:

/**
* This class expands a string into a list of numbers. The main use is to select a
* range of pages.
* <p>
* The general systax is:<br>
* [!][o][odd][e][even]start-end
* <p>
* You can have multiple ranges separated by commas ','. The '!' modifier removes the
* range from what is already selected. The range changes are incremental, that is,
* numbers are added or deleted as the range appears. The start or the end, but not both, can be ommited.
*/
like image 180
Mathew Leger Avatar answered Oct 16 '22 14:10

Mathew Leger


Here is the code I use to copy all but the last page of an existing PDF. Everything is in memory streams. The variable pdfByteArray is a byte[] of the original pdf obtained using ms.ToArray(). pdfByteArray is overwritten with the new PDF.

        PdfReader originalPDFReader = new PdfReader(pdfByteArray);

        using (MemoryStream msCopy = new MemoryStream())
        {
           using (Document docCopy = new Document())
           {
              using (PdfCopy copy = new PdfCopy(docCopy, msCopy))
              {
                 docCopy.Open();
                 for (int pageNum = 1; pageNum <= originalPDFReader.NumberOfPages - 1; pageNum ++)
                 {
                    copy.AddPage(copy.GetImportedPage(originalPDFReader, pageNum ));
                 }
                 docCopy.Close();
              }
           }

           pdfByteArray = msCopy.ToArray();
like image 23
Craig Howard Avatar answered Oct 16 '22 13:10

Craig Howard


Instead of deleting pages in a document what you actually do is create a new document and only import the pages that you want to keep. Below is a full working WinForms app that does that (targetting iTextSharp 5.1.1.0). The last parameter to the function removePagesFromPdf is an array of pages to keep.

The code below works off of physical files but would be very easy to convert to something based on streams so that you don't have to write to disk if you don't want to.

using System;
using System.ComponentModel;
using System.IO;
using System.Linq;
using System.Windows.Forms;
using iTextSharp.text.pdf;
using iTextSharp.text;


namespace Full_Profile1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            //The files that we are working with
            string sourceFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
            string sourceFile = Path.Combine(sourceFolder, "Test.pdf");
            string destFile = Path.Combine(sourceFolder, "TestOutput.pdf");

            //Remove all pages except 1,2,3,4 and 6
            removePagesFromPdf(sourceFile, destFile, 1, 2, 3, 4, 6);
            this.Close();
        }
        public void removePagesFromPdf(String sourceFile, String destinationFile, params int[] pagesToKeep)
        {
            //Used to pull individual pages from our source
            PdfReader r = new PdfReader(sourceFile);
            //Create our destination file
            using (FileStream fs = new FileStream(destinationFile, FileMode.Create, FileAccess.Write, FileShare.None))
            {
                using (Document doc = new Document())
                {
                    using (PdfWriter w = PdfWriter.GetInstance(doc, fs))
                    {
                        //Open the desitination for writing
                        doc.Open();
                        //Loop through each page that we want to keep
                        foreach (int page in pagesToKeep)
                        {
                            //Add a new blank page to destination document
                            doc.NewPage();
                            //Extract the given page from our reader and add it directly to the destination PDF
                            w.DirectContent.AddTemplate(w.GetImportedPage(r, page), 0, 0);
                        }
                        //Close our document
                        doc.Close();
                    }
                }
            }
        }
    }
}
like image 38
Chris Haas Avatar answered Oct 16 '22 14:10

Chris Haas