Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting PDF to image (with proper formatting)

i have a pdf file(attached). My objective is to convert a pdf to an image using pdfbox AS IT IS,(same as using snipping tool in windows). The pdf has all kinds of shapes and text .

i am using the following code:

PDDocument doc = PDDocument.load("Hello World.pdf");
PDPage firstPage = (PDPage) doc.getDocumentCatalog().getAllPages().get(67);
BufferedImage bufferedImage = firstPage.convertToImage(imageType,screenResolution);
ImageIO.write(bufferedImage, "png",new File("out.png"));

This is the PDF i want to convert

when i use the code, the image file gives totally wrong outputs(out.png attached) This is the image file converted from pdfbox

how do i make pdfbox take something like a direct snapshot image?

also, i noticed that the image quality of the png is not so good, is there any way to increase the resolution of the generated image?

EDIT: here is the pdf(see page number 68) https://drive.google.com/file/d/0B0ZiP71EQHz2NVZUcElvbFNreEU/edit?usp=sharing

EDIT 2: it seems that all the text isvanishing. i also tried using the PDFImageWriter class

test.writeImage(doc, "png", null, 68, 69, "final.png",TYPE_USHORT_GRAY,200 );

same result

like image 212
harveyslash Avatar asked Jan 12 '23 01:01

harveyslash


2 Answers

Using PDFRenderer it is possible to convert PDF page into image formats.

Convert PDF page into image in java Using PDF Renderer. Jars Required PDFRenderer-0.9.0

package com.pdfrenderer.examples;

import java.awt.Graphics2D;
import java.awt.Image;
import java.awt.Rectangle;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;

import javax.imageio.ImageIO;

import com.sun.pdfview.PDFFile;
import com.sun.pdfview.PDFPage;

public class PdfToImage {
    public static void main(String[] args) {
        try {
            String sourceDir = "C:/Documents/Chemistry.pdf";// PDF file must be placed in DataGet folder
            String destinationDir = "C:/Documents/Converted/";//Converted PDF page saved in this folder

        File sourceFile = new File(sourceDir);
        File destinationFile = new File(destinationDir);

        String fileName = sourceFile.getName().replace(".pdf", "_cover");

        if (sourceFile.exists()) {
            if (!destinationFile.exists()) {
                destinationFile.mkdir();
                System.out.println("Folder created in: "+ destinationFile.getCanonicalPath());
            }

            RandomAccessFile raf = new RandomAccessFile(sourceFile, "r");
            FileChannel channel = raf.getChannel();
            ByteBuffer buf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
            PDFFile pdf = new PDFFile(buf);
            int pageNumber = 62;// which PDF page to be convert
            PDFPage page = pdf.getPage(pageNumber);

            System.out.println("Total pages:"+ pdf.getNumPages());

            // create the image
            Rectangle rect = new Rectangle(0, 0, (int) page.getBBox().getWidth(), (int) page.getBBox().getHeight());
            BufferedImage bufferedImage = new BufferedImage(rect.width, rect.height, BufferedImage.TYPE_INT_RGB);

            // width & height, // clip rect, // null for the ImageObserver, // fill background with white, // block until drawing is done
            Image image = page.getImage(rect.width, rect.height, rect, null, true, true );
            Graphics2D bufImageGraphics = bufferedImage.createGraphics();
            bufImageGraphics.drawImage(image, 0, 0, null);

            File imageFile = new File( destinationDir + fileName +"_"+ pageNumber +".png" );// change file format here. Ex: .png, .jpg, .jpeg, .gif, .bmp

            ImageIO.write(bufferedImage, "png", imageFile);

            System.out.println(imageFile.getName() +" File created in: "+ destinationFile.getCanonicalPath());
        } else {
            System.err.println(sourceFile.getName() +" File not exists");
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}
}

ConvertedImage:

Chemistry_cover_62

like image 176
UdayKiran Pulipati Avatar answered Jan 17 '23 15:01

UdayKiran Pulipati


I get the same result as the OP using PDFBox version 1.8.4. In version 2.0.0-SNAPSHOT, though, it looks better:

enter image description here

Here only some arrows are thinner and some arrow parts are mis-drawn as boxes.

Thus,

how do i make pdfbox take something like a direct snapshot image?

The current release versions (up to 1.8.4) seem to have greater deficits when rendering PDFs as images. You may switch to a current development version (e.g. the current trunk, 2.0.0-SNAPSHOT) or wait until the improvements are released.

Furthermore, some minor deficits are even in 2.0.0-SNAPSHOT. You might want to present your sample document to the PDFBox people (i.e. create an according issue in their JIRA) so that they improve PDFBox even further to suit your needs.

also, i noticed that the image quality of the png is not so good, is there any way to increase the resolution of the generated image?

There are convertToImage overloads with resolution parameters. Your current code actually sets the resolution to screenResolution. Increase this resolution value.

PS: The code to render a PDF page to image has been refactored in 2.0.0-SNAPSHOT. Instead of

BufferedImage image =  page.convertToImage();

you now do

BufferedImage image =  RenderUtil.convertToImage(page);

I assume this has been done to remove direct AWT references from the core classes because AWT is not available on e.g. Android.


PS: The SNAPSHOT I used last year in this answer merely was a snapshot subject to changes. The 2.0.0 release is still under development, many things have changed. Especially there is no RenderUtil class anymore. Instead one currently has to use the PDFRenderer in the org.apache.pdfbox.rendering package...

like image 43
mkl Avatar answered Jan 17 '23 14:01

mkl