I have this code to read and extract a string from a pdf.
It works well but the log but the log repeatedly throws this message and I don't know why it is:
public class Test {
public static void main(String[] args) {
PDDocument doc = null;
try {
doc = PDDocument.load(new File("C:/prueba.pdf"));
PDFTextStripper pdfs = new PDFTextStripper();
String textOfPdf = "";
textOfPdf = pdfs.getText(doc);
String regex = "([A-Z0-9]{5}-[A-Z0-9]{5}-[A-Z0-9]{5}-[A-Z0-9]{5}-[A-Z0-9]{5}-[A-Z0-9]{5})";
Pattern patron = Pattern.compile(regex);
Matcher emparejador = patron.matcher(textOfPdf);
emparejador.find();
String text = emparejador.group(0);
System.out.print(text);
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (doc != null) {
doc.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
12:52:37.335 [main] DEBUG org.apache.pdfbox.pdfparser.PDFObjectStreamParser - parsed=COSObject{25, 0}
12:52:37.336 [main] DEBUG org.apache.pdfbox.pdfparser.PDFObjectStreamParser - parsed=COSObject{26, 0}
12:52:37.336 [main] DEBUG org.apache.pdfbox.pdfparser.PDFObjectStreamParser - parsed=COSObject{28, 0}
12:52:37.336 [main] DEBUG org.apache.pdfbox.pdfparser.PDFObjectStreamParser - parsed=COSObject{27, 0}
12:52:37.336 [main] DEBUG org.apache.pdfbox.pdfparser.PDFObjectStreamParser - parsed=COSObject{30, 0}
12:52:37.337 [main] DEBUG org.apache.pdfbox.pdfparser.PDFObjectStreamParser - parsed=COSObject{31, 0}
12:52:37.338 [main] DEBUG org.apache.pdfbox.pdfparser.PDFObjectStreamParser - parsed=COSObject{5, 0}
12:52:37.772 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
12:52:37.772 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
12:52:37.772 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
12:52:37.773 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
12:52:37.773 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
12:52:37.773 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
12:52:37.773 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
12:52:37.773 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
12:52:37.773 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
I have also tried tess4j library but the same thing happens. any ideas?
Regards
This is most likely an internal parser issue. By the looks of it, some of the PDF objects aren't explicitly closing the scratch files they using but are getting closed in a finalize method.
It doesn't look like an issue to me and there is not much you can do except turn off debug level logging for that class.
log4j.logger.org.apache.pdfbox.io.ScratchFileBuffer=WARN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With