Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pdfbox convert pdf to image byte[]

Tags:

java

pdfbox

Using pdfbox, is it possible to convert a PDF (or a PDF byte[]) into an image byte[]? I've looked through several examples online and the only ones I can find describe how either to directly write the converted file to the filesystem or to convert it to a Java AWT object.

I'd rather not incur the IO of writing an image file to the filesystem, read into a byte[], and then delete it.

So this I can do:

String destinationImageFormat = "jpg";
boolean success = false;
InputStream is = getClass().getClassLoader().getResourceAsStream("example.pdf");
PDDocument pdf = PDDocument.load( is, true );

int resolution = 256;
String password = "";
String outputPrefix = "myImageFile";

PDFImageWriter imageWriter = new PDFImageWriter();    

success = imageWriter.writeImage(pdf, 
                    destinationImageFormat, 
                    password, 
                    1, 
                    2, 
                    outputPrefix, 
                    BufferedImage.TYPE_INT_RGB, 
                    resolution);

As well as this:

InputStream is = getClass().getClassLoader().getResourceAsStream("example.pdf");

PDDocument pdf = PDDocument.load( is, true );
List<PDPage> pages = pdf.getDocumentCatalog().getAllPages();

for ( PDPage page : pages )
{
    BufferedImage image = page.convertToImage();
}

Where I'm not clear on is how to tranform the BufferedImage into a byte[]. I know this is transformed into a file output stream in imageWriter.writeImage(), but I'm not clear on how the API works.

like image 708
user2100746 Avatar asked Feb 22 '13 21:02

user2100746


2 Answers

Add maven dependency:

    <!-- https://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox -->
    <dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>pdfbox</artifactId>
        <version>2.0.1</version>
    </dependency>

And, conver a pdf to image:

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.PDFRenderer;
import javax.imageio.ImageIO;

private List<String> savePDF(String filePath) throws IOException {
    List<String> result = Lists.newArrayList();

    File file = new File(filePath);

    PDDocument doc = PDDocument.load(file);
    PDFRenderer renderer = new PDFRenderer(doc);

    int pageSize = doc.getNumberOfPages();
    for (int i = 0; i < pageSize; i++) {
        String pngFileName = file.getPath() + "." + (i + 1) + ".png";

        FileOutputStream out = new FileOutputStream(pngFileName);
        ImageIO.write(renderer.renderImageWithDPI(i, 96), "png", out);
        out.close();

        result.add(pngFileName);
    }
    doc.close();
    return result;
}

EDIT:

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.PDFRenderer;
import javax.imageio.ImageIO;

private List<String> savePDF(String filePath) throws IOException {
    List<String> result = Lists.newArrayList();

    File file = new File(filePath);

    PDDocument doc = PDDocument.load(file);
    PDFRenderer renderer = new PDFRenderer(doc);

    int pageSize = doc.getNumberOfPages();
    for (int i = 0; i < pageSize; i++) {
        String pngFileName = file.getPath() + "." + (i + 1) + ".png";

        ByteArrayOutputStream out = new ByteArrayOutputStream(pngFileName);
        ImageIO.write(renderer.renderImageWithDPI(i, 96), "png", out);

        out.toByteArray(); // here you can get a byte array

        out.close();

        result.add(pngFileName);
    }
    doc.close();
    return result;
}
like image 105
BeeNoisy Avatar answered Oct 25 '22 23:10

BeeNoisy


You can use ImageIO.write to write to an OutputStream. To get a byte[], use a ByteArrayOutputStream, then call toByteArray() on it.

like image 29
aditsu quit because SE is EVIL Avatar answered Oct 25 '22 22:10

aditsu quit because SE is EVIL