Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add metadata to PDF document using PDFbox?

I have an input stream of a PDF document available to me. I would like to add subject metadata to the document and then save it. I'm not sure how to do this.

I came across a sample recipe here: https://pdfbox.apache.org/1.8/cookbook/workingwithmetadata.html

However, it is still fuzzy. Below is what I'm trying and places where I have questions

PDDocument doc = PDDocument.load(myInputStream);
PDDocumentCatalog catalog = doc.getDocumentCatalog();
InputStream newXMPData = ...; //what goes here? How can I add subject tag?
PDMetadata newMetadata = new PDMetadata(doc, newXMLData, false );
catalog.setMetadata( newMetadata );
//does anything else need to happen to save the document??
//I would like an outputstream of the document (with metadata) so that I can save it to an S3 bucket
like image 400
Anthony Avatar asked Oct 27 '16 22:10

Anthony


People also ask

How do I add metadata to PDF?

Add a description to Document PropertiesChoose File > Properties. Click the Description tab, and type the author's name, subject, and keywords. (Optional) Click Additional Metadata to add other descriptive information, such as copyright information.

Can you change the metadata of a PDF?

With the PDF file you want to inspect open in Power PDF, click File and find the Info tab. Under Properties, click Advanced Properties. The PDF metadata dialog box opens. Add, edit, or manually remove any of the metadata displayed.

What is the use of PDFBox?

It allows the creation of new PDF documents, manipulation of existing documents, bookmarking PDF and the ability to extract content from PDF documents. We can also use it to digitally sign, print and validate files against the PDF/A-1b standard. PDFBox library was originally developed in 2002 by Ben Litchfield.


2 Answers

Another much easier way to do this would be to use the built-in Document Information object:

PDDocument inputDoc = // your doc
inputDoc.getDocumentInformation().setCreator("Some meta");
inputDoc.getDocumentInformation().setCustomMetadataValue("fieldName", "fieldValue");

This also has the benefit of not requiring the xmpbox library.

like image 103
jacobw125 Avatar answered Oct 25 '22 11:10

jacobw125


The following code sets the title of a PDF document, but it should be adaptable to work with other properties as well:

public static byte[] insertTitlePdf(byte[] documentBytes, String title) {
    try {
        PDDocument document = PDDocument.load(documentBytes);
        PDDocumentInformation info = document.getDocumentInformation();
        info.setTitle(title);
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        document.save(baos);
        return baos.toByteArray();
    } catch (IOException e) {
        e.printStackTrace();
    }

    return null;
}

Apache PDFBox is needed, so import it to e.g. Maven with:

<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>2.0.6</version>
</dependency>

Add a title with:

byte[] documentBytesWithTitle = insertTitlePdf(documentBytes, "Some fancy title");

Display it in the browser with (JSF example):

<object class="pdf" data="data:application/pdf;base64,#{myBean.getDocumentBytesWithTitleAsBase64()}" type="application/pdf">Document could not be loaded</object>

Result (Chrome):

PDF documento title change result

like image 21
Arion Krause Avatar answered Oct 25 '22 11:10

Arion Krause