I'm trying to create a PDF/A file using PDFBox 2. My code is based on the exmpale code here. The code runs wihtout errors. But if I validate the file using callas pdfPilot and veraPDF there is no XMP metadata and no PDF/A version info. Also the PDF file is version 1.4. Not 1.7 as set in the code.
// TTF font needed for Unicode support in OCR texts
PDFont font = PDType0Font.load(document,
PDDocument.class.getResourceAsStream("/org/apache/pdfbox/resources/ttf/LiberationSans-Regular.ttf"), true);
// Add metadata (needed by PDF/A)
XMPMetadata xmp = XMPMetadata.createXMPMetadata();
try {
DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
dc.setTitle("THE DOCUMENT TITLE");
dc.addCreator("THE AUTHOR");
PDFAIdentificationSchema id = xmp.createAndAddPFAIdentificationSchema();
id.setPart(2);
id.setConformance("B");
XmpSerializer serializer = new XmpSerializer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
serializer.serialize(xmp, baos, true);
PDMetadata metadata = new PDMetadata(document);
metadata.importXMPMetadata(baos.toByteArray());
document.getDocumentCatalog().setMetadata(metadata);
} catch (BadFieldValueException e) {
throw new IllegalArgumentException("", e);
}
// Set color profile (needed by PDF/A)
InputStream colorProfile = PDDocument.class.getResourceAsStream("/sRGB.icc");
PDOutputIntent intent = new PDOutputIntent(document, colorProfile);
intent.setInfo("sRGB IEC61966-2.1");
intent.setOutputCondition("sRGB IEC61966-2.1");
intent.setOutputConditionIdentifier("sRGB IEC61966-2.1");
intent.setRegistryName("http://www.color.org");
document.getDocumentCatalog().addOutputIntent(intent);
// Render all pages
for (IPage page : pages) {
((PdfboxPage)page).setFont(font);
page.renderPage(this);
document.addPage((PDPage) page.getPage());
}
document.setVersion(1.7f);
document.save(path);
document.close();
What am I doing wrong?
EDIT 1:
I can see there is the xpacket in the PDF file. It includes the metadata. But it looks like PDFBox doesn't write this data in a valid way (for veraPDF and pdfPilot).
EDIT 2:
Looks like PDFBox 2.0.12 builds invalid PDF/A. I converted the PDF using our commercial pdfPilot program. (PDF/A-1b)
PDFBox writes this to the PDF file (-> invalid in veraPDF and pdfPilot):
<?xpacket begin="
" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/" rdf:about="">
<dc:title>
<rdf:Alt>
<rdf:li lang="x-default">THE DOCUMENT TITLE</rdf:li>
</rdf:Alt>
</dc:title>
<dc:creator>
<rdf:Seq>
<rdf:li>THE AUTHOR</rdf:li>
</rdf:Seq>
</dc:creator>
</rdf:Description>
<rdf:Description xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/" rdf:about="">
<pdfaid:part>1</pdfaid:part>
<pdfaid:conformance>B</pdfaid:conformance>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
pdfPilot writes this to the PDF file (-> valid in veraPDF and pdfPilot):
<?xpacket begin="
" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.6-c015 81.159809, 2016/11/11-01:42:16 ">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xmp="http://ns.adobe.com/xap/1.0/"
xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"
xmlns:stEvt="http://ns.adobe.com/xap/1.0/sType/ResourceEvent#"
xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"
xmlns:pdfaExtension="http://www.aiim.org/pdfa/ns/extension/"
xmlns:pdfaSchema="http://www.aiim.org/pdfa/ns/schema#"
xmlns:pdfaProperty="http://www.aiim.org/pdfa/ns/property#">
<dc:format>application/pdf</dc:format>
<dc:creator>
<rdf:Seq>
<rdf:li>AUTOR</rdf:li>
</rdf:Seq>
</dc:creator>
<dc:title>
<rdf:Alt>
<rdf:li xml:lang="x-default">TITEL</rdf:li>
</rdf:Alt>
</dc:title>
<xmp:ModifyDate>2019-01-11T11:42:22+01:00</xmp:ModifyDate>
<xmp:CreateDate>2019-01-11T11:42:21+01:00</xmp:CreateDate>
<xmp:MetadataDate>2019-01-11T11:42:22+01:00</xmp:MetadataDate>
<xmpMM:DocumentID>uuid:b60f88c2-aa89-11b2-0a00-104bbf060000</xmpMM:DocumentID>
<xmpMM:InstanceID>uuid:b61148b9-aa89-11b2-0a00-60d9faa0ff7f</xmpMM:InstanceID>
<xmpMM:RenditionClass>default</xmpMM:RenditionClass>
<xmpMM:VersionID>1</xmpMM:VersionID>
<xmpMM:History>
<rdf:Seq>
<rdf:li rdf:parseType="Resource">
<stEvt:action>converted</stEvt:action>
<stEvt:instanceID>uuid:b60f88c3-aa89-11b2-0a00-902dfba0ff7f</stEvt:instanceID>
<stEvt:parameters>converted to PDF/A-1b</stEvt:parameters>
<stEvt:softwareAgent>pdfaPilot</stEvt:softwareAgent>
<stEvt:when>2019-01-11T11:42:22+01:00</stEvt:when>
</rdf:li>
</rdf:Seq>
</xmpMM:History>
<pdfaid:part>1</pdfaid:part>
<pdfaid:conformance>B</pdfaid:conformance>
<pdfaExtension:schemas>
<rdf:Bag>
<rdf:li rdf:parseType="Resource">
<pdfaSchema:namespaceURI>http://ns.adobe.com/xap/1.0/mm/</pdfaSchema:namespaceURI>
<pdfaSchema:prefix>xmpMM</pdfaSchema:prefix>
<pdfaSchema:schema>XMP Media Management Schema</pdfaSchema:schema>
<pdfaSchema:property>
<rdf:Seq>
<rdf:li rdf:parseType="Resource">
<pdfaProperty:category>internal</pdfaProperty:category>
<pdfaProperty:description>UUID based identifier for specific incarnation of a document</pdfaProperty:description>
<pdfaProperty:name>InstanceID</pdfaProperty:name>
<pdfaProperty:valueType>URI</pdfaProperty:valueType>
</rdf:li>
<rdf:li rdf:parseType="Resource">
<pdfaProperty:category>internal</pdfaProperty:category>
<pdfaProperty:description>The common identifier for all versions and renditions of a document.</pdfaProperty:description>
<pdfaProperty:name>OriginalDocumentID</pdfaProperty:name>
<pdfaProperty:valueType>URI</pdfaProperty:valueType>
</rdf:li>
</rdf:Seq>
</pdfaSchema:property>
</rdf:li>
<rdf:li rdf:parseType="Resource">
<pdfaSchema:namespaceURI>http://www.aiim.org/pdfa/ns/id/</pdfaSchema:namespaceURI>
<pdfaSchema:prefix>pdfaid</pdfaSchema:prefix>
<pdfaSchema:schema>PDF/A ID Schema</pdfaSchema:schema>
<pdfaSchema:property>
<rdf:Seq>
<rdf:li rdf:parseType="Resource">
<pdfaProperty:category>internal</pdfaProperty:category>
<pdfaProperty:description>Part of PDF/A standard</pdfaProperty:description>
<pdfaProperty:name>part</pdfaProperty:name>
<pdfaProperty:valueType>Integer</pdfaProperty:valueType>
</rdf:li>
<rdf:li rdf:parseType="Resource">
<pdfaProperty:category>internal</pdfaProperty:category>
<pdfaProperty:description>Amendment of PDF/A standard</pdfaProperty:description>
<pdfaProperty:name>amd</pdfaProperty:name>
<pdfaProperty:valueType>Text</pdfaProperty:valueType>
</rdf:li>
<rdf:li rdf:parseType="Resource">
<pdfaProperty:category>internal</pdfaProperty:category>
<pdfaProperty:description>Conformance level of PDF/A standard</pdfaProperty:description>
<pdfaProperty:name>conformance</pdfaProperty:name>
<pdfaProperty:valueType>Text</pdfaProperty:valueType>
</rdf:li>
</rdf:Seq>
</pdfaSchema:property>
</rdf:li>
</rdf:Bag>
</pdfaExtension:schemas>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
And if I statically write this to the PDF file it produces a valid PDF/A file:
String xmpData = "<?xpacket ......";
PDMetadata metadata = new PDMetadata(document);
metadata.importXMPMetadata(xmpData.getBytes());
EDIT 3:
Adding this is valid and short:
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" >
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/">
<dc:format>application/pdf</dc:format>
<dc:creator>
<rdf:Seq>
<rdf:li>AUTOR</rdf:li>
</rdf:Seq>
</dc:creator>
<dc:title>
<rdf:Alt>
<rdf:li xml:lang="x-default">TITEL</rdf:li>
</rdf:Alt>
</dc:title>
<pdfaid:part>1</pdfaid:part>
<pdfaid:conformance>B</pdfaid:conformance>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
The is a difference between the XML produced by the CreatePDFA example
<rdf:li xml:lang="x-default">THE DOCUMENT TITLE</rdf:li>
to what you got
<rdf:li lang="x-default">THE DOCUMENT TITLE</rdf:li>
and this reminded me of a problem we had 1 1/2 years ago and that was discussed here.
So to quote from my answer from 2017: This code
Transformer transformer = TransformerFactory.newInstance().newTransformer();
should return a com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl class. If not, then call
Transformer transformer =
TransformerFactory.newInstance("com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl", null).newTransformer();
or set a system property:
System.setProperty("javax.xml.transform.TransformerFactory", "com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl");
What I can't answer (because you didn't tell) is how you ended up having this transformer, and what will happen to the rest of your application if you change it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With