Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I verify that a PDF file is "good"?

I have a process that compresses PDF files that our secretaries create by scanning signed documents at a multi-function printer.

On rare occasions, these files cannot be opened in Acrobat reader after being compressed. I don't know why this is happening rarely, so I'd like to be able to test the PDF post-compression and see if it is "good".

I am trying to use itextsharp 5.1.1 to accomplish this, but it happily loads the PDF. My best guess is that Acrobat reader fails when it's trying to display the picture.

Any ideas on how I can tell if the PDF will render?

like image 835
Kevin Buchan Avatar asked Jul 27 '11 17:07

Kevin Buchan


4 Answers

In similar situations in the past I have successfully used the PDF Toolkit (a/k/a pdftk) to repair bad PDFs with a command like this: pdftk broken.pdf output fixed.pdf.

like image 184
ewall Avatar answered Oct 05 '22 20:10

ewall


OK, what I ended up doing was using itextsharp to loop through all of the stream objects and check their length. The error condition I had was that the length would be zero. This test seems quite reliable. It may not work for everyone, but it worked in this particular situation.

like image 28
Kevin Buchan Avatar answered Oct 05 '22 21:10

Kevin Buchan


PdfCpu works great. relaxed example:

pdfcpu validate goggles.pdf

Strict example:

pdfcpu validate -m strict goggles.pdf

https://pdfcpu.io/core/validate

like image 20
Zombo Avatar answered Oct 05 '22 21:10

Zombo


qpdf will be of great help for your needs:

apt-get install qpdf

qpdf --check filename.pdf

example output:

checking filename.pdf
PDF Version: 1.4
File is not encrypted
File is not linearized
WARNING: filename.pdf: file is damaged
WARNING: filename.pdf (object 185 0, file position 1235875): expected n n obj
WARNING: filename.pdf: Attempting to reconstruct cross-reference table
WARNING: filename.pdf: object 185 0 not found in file after regenerating cross reference table
operation for Dictionary object attempted on object of wrong type
like image 38
Oli Avatar answered Oct 05 '22 22:10

Oli