I have web application where person can upload any pdf via FTP. After pdf file get uploaded I perform certain operations over that pdf.
But the problem here is, while uploading the PDF via FTP sometimes connection breaks up in between and the pdf uploaded is not complete (act like corrupted one). When I try to open that document in arobat reader it gives message 'There was an error opening the document. The file is damaged and could not be repaired'.
Now before starting processing over PDF, I want to check whether pdf uploaded is readable means no corrupted.
Do java provide any API for that, or there is any method to check whether file is corrupted or not.
Look at the file size. Right-click on the file and choose "Properties." You will see the file size in the Properties. Compare this to another version of the file or a similar file if you have one. If you have another copy of the file and the file you have is smaller, then it may be corrupt.
Your code is basically OK, try to find out which file is responsible for the corrupted zip file. Check whether digitalFile. getFile() always returns a valid and accessible argument to FileInputStream. Just add a bit logging to your code and you will find out what's wrong.
A data or program file that has been altered accidentally by hardware or software failure or on purpose by an attacker. Because the bits are rearranged, a corrupted file is either unreadable to the hardware or, if readable, indecipherable to the software.
We have iText API in Java to work on PDF files.
To check if a PDF file is valid to load and read, use com.itextpdf.text.pdf.PdfReader
.
If the file is corrupted, an exception like com.itextpdf.text.exceptions.InvalidPdfException
, is thrown.
Sample code snippet:
...
import com.itextpdf.text.pdf.PdfReader;
...
try {
PdfReader pdfReader = new PdfReader( pathToUploadedPdfFile );
String textFromPdfFilePageOne = PdfTextExtractor.getTextFromPage( pdfReader, 1 );
System.out.println( textFromPdfFilePageOne );
}
catch ( Exception e ) {
// handle exception
}
In case of uploaded but corrupted files, you may face the following error:
com.itextpdf.text.exceptions.InvalidPdfException: Rebuild failed:
trailer not found.; Original message: PDF startxref not found.
Note: To produce such an exception, try saving a pdf file from net, but abort it in the middle.
Use it to load through above code snippet and check if it is loaded safe.
You can find detailed examples on iText API at
Use Case Examples of iText PDF | iText.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With