I need to understand if a PDF has any kind of digital signature. I have to manage huge PDFs, e.g. 500MB each, so I just need to find a way to separate non-signed from signed (so I can send just signed PDFs to a method that manages them). Any procedure found until now involves attempt to extract certificate via e.g. Bouncycastle libs (in my case, for Java): if it is present, pdf is signed, if it not present or a exception is raised, is it not (sic!). But this is obviously time/memory consuming, other than an example of resource-wastings implementation.
Is there any quick language-independent way, e.g. opening PDF file, and reading first bytes and finding an info telling that file is signed? Alternatively, is there any reference manual telling in detail how is made internally a PDF?
Thank you in advance
Using command line you can check if a file has a digital signature with pdfsig tool from poppler-utils package (works on Ubuntu 20.04).
pdfsig pdffile.pdf
will produce output with detailed data on the signatures included and validation data. If you need to scan a pdf file tree and get a list of signed pdfs you can use a bash command like:
find ./path/to/files -iname '*.pdf' \
-exec bash -c 'pdfsig "$0"; \
if [[ $? -eq 0 ]]; then \
echo "$0" >> signed-files.txt; fi' {} \;
You will get a list of signed files in signed-files.txt file in the local directory.
I have found this to be much more reliable than trying to grep some text out of a pdf file (for example, the pdfs produced by signing services in Lithuania do not contain the string "SigFlags" which was mentioned in the previous answers).
You are going to want to use a PDF Library rather than trying to implement this all yourself, otherwise you will get bogged down with handling the variations of Linearized documents, Filters, Incremental updates, object streams, cross-reference streams, and more.
With regards to reference material; per my cursory search, it looks like Adobe is no longer providing its version of the ISO 32000:2008 specification to any and all, though that specification is mainly a translation of the PDF v1.7 Reference manual to ISO-conforming language.
So assuming the PDF v1.7 Reference, the most relevant sections are going to be 8.7 (Digital Signatures), 3.6.1 (Document Catalog), and 8.6 (Interactive Forms).
The basic process is going to be:
Using a PDF library that can use the document's cross-reference table to navigate you to the right indirect objects should be faster and less resource-intensive than a brute-force search of the document for a certificate.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With