PDFbox loading large files

Tags:

java

pdfbox

I'm trying to convert the first page of a pdf file to image using PDFBox. When i'm loading a large pdf file i get an exception.

code:

    PDDocument doc;
    try {
        InputStream input  = new URL("http://www.jewishfederations.org/local_includes/downloads/39497.pdf").openStream();
        doc = PDDocument.load(input);
        PDPage firstPage = (PDPage) doc.getDocumentCatalog().getAllPages().get(0);
        BufferedImage image =firstPage.convertToImage();
        File outputfile = new File("image2.png");
        ImageIO.write(image, "png", outputfile);
        input.close();
        doc.close();

    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

exception:

org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 72435 is wrong. Fall back to reading stream until 'endstream'.
org.apache.pdfbox.exceptions.WrappedIOException: Could not push back 72435 bytes in order to reparse stream. Try increasing push back buffer using system property org.apache.pdfbox.baseParser.pushBackSize
    at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:554)
    at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:605)
    at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:194)
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1219)
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1186)
    at Worker.main(Worker.java:27)
Caused by: java.io.IOException: Push back buffer is full
    at java.io.PushbackInputStream.unread(Unknown Source)
    at org.apache.pdfbox.io.PushBackInputStream.unread(PushBackInputStream.java:144)
    at org.apache.pdfbox.io.PushBackInputStream.unread(PushBackInputStream.java:133)
    at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:550)
    ... 5 more

226

asked Apr 08 '14 19:04

user2958571

1 Answers

An alternative solution for the 1.8.* PDFBox versions is to use the non-sequential parser. In that case, the code would not be

doc = PDDocument.load(input);

but

doc = PDDocument.loadNonSeq(input, null);

that parser (which will be the only one in the upcoming 2.0 version) is independent of the size of a pushback buffer.

102

answered Sep 20 '22 20:09

Tilman Hausherr

Related questions
                            
                                Difference between @Transactional and @TransactionAttribute
                            
                                Why is it allowed to label almost every statement in Java?
                            
                                What is Apache Camel's File "camelLock"?
                            
                                How to use java proxy in scala
                            
                                Changes in HTML not reflected as long as I am using proxy
                            
                                Securing JAX-RS with Apache CXF and OAuth 2.0
                            
                                Can't resolve com.google.android.gms.plus.Plus class
                            
                                How can I save a BufferedImage to be below a particular size
                            
                                Spring Security OAuth2 simple configuration
                            
                                unable to read file using getResourceAsStream
                            
                                Unique Computational value for an array
                            
                                Android: support %s expansion in DialogPreference summary
                            
                                What set the value of JVM parameter MaxNewSize? Ergonomics?
                            
                                mockito @Mock does not inject into named @Resource as expected
                            
                                Format of DateTime field not recognized by Spring
                            
                                Spring Data Elastic Search - Sort Geo Locations by distance
                            
                                Converting first letter of each word to uppercase [closed]
                            
                                Can I make an overridden method synchronized?
                            
                                Are there new arguments to use getters and setters since Java 8?
                            
                                xjc fails to generate classes when using bindings

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With