Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a "packed PDF", and how can it be read?

Tags:

pdf

pdfbox

I have been sent versions of "packed PDF" files where the top-level PDF contains child PDFs.

The top-level PDF acts primarily as a container. The packing is not always evident in Adobe reader (e.g. when pdftk is used to pack the link does not show). I can find little by Googling for this term nor in my 2012 book ("Whittington", "PDF Explained", O'Reilly).

Is this a standard part of PDF? If so I'd be grateful for pointers. And can PDFBox analyze it?

like image 334
peter.murray.rust Avatar asked Nov 04 '22 04:11

peter.murray.rust


1 Answers

Concerning your question whether using PDF as a container file format is a standard part of PDF:

Yes, it is. ISO 32000-1:2008 describes it in section 7.11.4 Embedded file streams.

Most prominent are files associated to some document page, see 12.5.6.15, File Attachment Annotations, and those associated with the document as a whole through the EmbeddedFiles entry (PDF 1.4) in the PDF document’s name dictionary (see 7.7.4, Name Dictionary).

@JesseGood's link to PDF File Specification on the PDFBox site explains how to deal with the latter ones.

I'm not very knowledgeable concerning PDFBox and, therefore, don't know whether it allows easy access to the other kind of attachments, too. If it does not, you will essentially have to iterate the annotations of all pages to find the file attachment annotations and handle the contents according to the PDF specification.

like image 200
mkl Avatar answered Nov 09 '22 02:11

mkl