Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing PDF invisible objects with iTextSharp

Tags:

c#

pdf

itextsharp

Is possible to use iTextSharp to remove from a PDF document objects that are not visible (or at least not being displayed)?

More details:

1) My source is a PDF page containing images and text (maybe some vectorial drawings) and embedded fonts.

2) There's an interface to design multiple 'crop boxes'.

3) I must generate a new PDF that contains only what is inside the crop boxes. Anything else must be removed from resulting document (indeed I may accept content which is half inside and half outside, but this is not the ideal and it should not appear anyway).

My solution so far:

I have successfully developed a solution that creates new temporary documents, each one containing the content of each crop box (using writer.GetImportedPage and contentByte.AddTemplate to a page that is exactly the size of the crop box). Then I create the final document and repeat the process, using the AddTemplate method do position each "cropped page" in the final page.

This solution has 2 big disadvantages:

  • the size of the document is the [original size] * [number of crop boxes], since the entire page is there, stamped many times! (invisible, but it's there)
  • the invisible text may still be accessed by selecting all (CTRL+A) within Reader and pasted.

So, I think I need to iterate through PDF objects, detect if it is visible or not, and delete it. At the time of writing, I am trying to use pdfReader.GetPdfObject.

Thanks for the help.

like image 491
Hetote Avatar asked Mar 05 '13 18:03

Hetote


3 Answers

If the PDF which you are trying is a template/predefined/fixed then you can remove that object by calling RemoveField.

PdfReader pdfReader = new PdfReader(../Template_Path.pdf"));
PdfStamper pdfStamperToPopulate = new PdfStamper(pdfReader, new FileStream(outputPath, FileMode.Create));
AcroFields pdfFormFields = pdfStamperToPopulate.AcroFields;
pdfFormFields.RemoveField("fieldNameToBeRemoved");
like image 98
Praveena M Avatar answered Nov 13 '22 02:11

Praveena M


PdfReader pdfReader = new PdfReader(../Template_Path.pdf"));
PdfStamper pdfStamperToPopulate = new PdfStamper(pdfReader, new FileStream(outputPath, FileMode.Create));
AcroFields pdfFormFields = pdfStamperToPopulate.AcroFields;
pdfFormFields.RemoveField("fieldNameToBeRemoved");
like image 1
Amin AmiriDarban Avatar answered Nov 13 '22 02:11

Amin AmiriDarban


Yes, it's possible. You need to parse pdf page content bytes to PdfObjects, store them to the memory, delete unvanted PdfObject's, build Pdf content from PdfObject's back to pdf content bytes, replace page content in PdfReader just before you import the page via PdfWriter.

I would recommend you to check out this: http://habjan.blogspot.com/2013/09/proof-of-concept-converting-pdf-files.html

Sample from the link implements Pdf content bytes parsing, building back from PdfObjec's, replacing PdfReader page content bytes...

like image 1
HABJAN Avatar answered Nov 13 '22 02:11

HABJAN