I have been sent versions of "packed PDF" files where the top-level PDF contains child PDFs.
The top-level PDF acts primarily as a container. The packing is not always evident in Adobe reader (e.g. when pdftk is used to pack the link does not show). I can find little by Googling for this term nor in my 2012 book ("Whittington", "PDF Explained", O'Reilly).
Is this a standard part of PDF? If so I'd be grateful for pointers. And can PDFBox analyze it?
Concerning your question whether using PDF as a container file format is a standard part of PDF:
Yes, it is. ISO 32000-1:2008 describes it in section 7.11.4 Embedded file streams.
Most prominent are files associated to some document page, see 12.5.6.15, File Attachment Annotations, and those associated with the document as a whole through the EmbeddedFiles entry (PDF 1.4) in the PDF document’s name dictionary (see 7.7.4, Name Dictionary).
@JesseGood's link to PDF File Specification on the PDFBox site explains how to deal with the latter ones.
I'm not very knowledgeable concerning PDFBox and, therefore, don't know whether it allows easy access to the other kind of attachments, too. If it does not, you will essentially have to iterate the annotations of all pages to find the file attachment annotations and handle the contents according to the PDF specification.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With