Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to tell whether a PDF is tagged

Is it possible to determine programatically whether a PDF is "tagged" (for accessibility)? I'm using PHP, and would like (if possible) to simply read a PDF file and return true if tagged, false if not.

I've looked at FPDF and TCPDF, but it isn't clear to me whether either can extract this information.

like image 982
Terrill Thompson Avatar asked May 27 '12 15:05

Terrill Thompson


People also ask

What does it mean when a PDF is tagged?

A tagged PDF is a PDF document that contains tags. Tags provide a logical structure that governs how the content of the PDF is presented through assistive technology. A properly tagged PDF, therefore, becomes accessible to everyone.

How do you find the tag of the content in the Tagged PDF?

Finding Tagged Content from the Document Panel Select text, an image, or a table from the document panel 3. Select the Options menu at the top of the Tags pane and then select Find Tag from Selection.

Can you tell if someone modified a PDF?

There is no sure proof way to determine if a generic PDF file is modified. If you go to the document properties of a PDF file (control or command d), if the proper metadata is available, it will list the creation date and time and modified date and time.


1 Answers

In the official ISO PDF-1.7 specification (in the copy available for free from the Adobe website), I read on page 574:

"A Tagged PDF document shall also contain a mark information dictionary (see Table 321) with a value of true for the Marked entry."

To me that means...

  1. ...you'll have to parse the PDF structure and
  2. ...look for the document catalogue
  3. ...where there should be a MarkInfo entry
  4. ...specifying a mark information dictionary
  5. ...which should contain a key named Marked with a boolean value of true for tagged PDF.
like image 118
Kurt Pfeifle Avatar answered Sep 22 '22 19:09

Kurt Pfeifle