I have a process that compresses PDF files that our secretaries create by scanning signed documents at a multi-function printer.
On rare occasions, these files cannot be opened in Acrobat reader after being compressed. I don't know why this is happening rarely, so I'd like to be able to test the PDF post-compression and see if it is "good".
I am trying to use itextsharp 5.1.1 to accomplish this, but it happily loads the PDF. My best guess is that Acrobat reader fails when it's trying to display the picture.
Any ideas on how I can tell if the PDF will render?
In similar situations in the past I have successfully used the PDF Toolkit (a/k/a pdftk) to repair bad PDFs with a command like this: pdftk broken.pdf output fixed.pdf
.
OK, what I ended up doing was using itextsharp to loop through all of the stream objects and check their length. The error condition I had was that the length would be zero. This test seems quite reliable. It may not work for everyone, but it worked in this particular situation.
PdfCpu works great. relaxed example:
pdfcpu validate goggles.pdf
Strict example:
pdfcpu validate -m strict goggles.pdf
https://pdfcpu.io/core/validate
qpdf will be of great help for your needs:
apt-get install qpdf
qpdf --check filename.pdf
example output:
checking filename.pdf
PDF Version: 1.4
File is not encrypted
File is not linearized
WARNING: filename.pdf: file is damaged
WARNING: filename.pdf (object 185 0, file position 1235875): expected n n obj
WARNING: filename.pdf: Attempting to reconstruct cross-reference table
WARNING: filename.pdf: object 185 0 not found in file after regenerating cross reference table
operation for Dictionary object attempted on object of wrong type
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With