Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PDF metadata removal using Java

Tags:

pdf

pdfbox

itext

How to remove metadata on PDF using Java?

Is IText will do or any other frameworks have ability to do this? I didn't find any examples or Classes which will remove metadata using IText. If anybody done this before or any ideas?

Please share your views.

Thanks in advance.

like image 295
JAVAC Avatar asked Jan 22 '14 22:01

JAVAC


2 Answers

First you need to differentiate since there are two types of metadata in the PDF:

  1. XMP meta data
  2. DID (document information dictionary, the old way)

The first you remove like the following:

PdfReader reader = stamper.getReader();
reader.getCatalog().remove(PdfName.METADATA);
reader.removeUnusedObjects();

The 2nd you remove like SANN3 has mentioned:

HashMap<String, String> info = super.reader.getInfo();
info.put("Title", null);
info.put("Author", null);
info.put("Subject", null);
info.put("Keywords", null);
info.put("Creator", null);
info.put("Producer", null;
info.put("CreationDate", null);
info.put("ModDate", null);
info.put("Trapped", null);
stamper.setMoreInfo(info);

If you then search the PDF with a text editor you won't find the /INFO dictionary nor XMP meta data...

like image 81
Lonzak Avatar answered Sep 23 '22 16:09

Lonzak


Try this code

PdfReader readInputPDF = new PdfReader("sample.pdf");
HashMap<String, String> hMap = readInputPDF.getInfo();
PdfStamper stamper = new PdfStamper(readInputPDF, new FileOutputStream("sample1.pdf"));
hMap.put("Author", null);
stamper.setMoreInfo(hMap);
stamper.close();

Add the Metadata properties to the map which you want to remove from the PDF.

like image 23
SANN3 Avatar answered Sep 22 '22 16:09

SANN3