Need to check if PDF Tags have properties as per Accessibility guidelines. Examples:
So far I was able to:
PDDocument.getDocumentInformation().getMetadataKeys();
PDDocument.getDocumentCatalog().getMarkInfo().isMarked();
flagTo access the Tags, I have tried these options:
getDocumentCatalog().getAcroForm()
returns NullPDDocument.getDocumentCatalog().getPages().get(0).getAnnotations();
returns NullPDDocument.getDocumentCatalog().getStructureTreeRoot().getKids()
but its returning only 1 StructElem
type objectCreation of Accessible PDF is done using OpenText so Dev team doesn't know about PDFBox. I am lost here as how to get the access to Tags/Objects (use MarkedContent or something else).
Please suggest how to extract the individual objects(tags) such as P, H1, Table, Figure/Image and validate their properties. Note: Manual validation of these properties are performed using Adobe Acrobat Pro
Based upon https://issues.apache.org/jira/browse/PDFBOX-7, it appears that you can use PDFMarkedContentExtractor to get the information that you need.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With