i have a pdf file(attached). My objective is to convert a pdf to an image using pdfbox AS IT IS,(same as using snipping tool in windows). The pdf has all kinds of shapes and text .
i am using the following code:
PDDocument doc = PDDocument.load("Hello World.pdf");
PDPage firstPage = (PDPage) doc.getDocumentCatalog().getAllPages().get(67);
BufferedImage bufferedImage = firstPage.convertToImage(imageType,screenResolution);
ImageIO.write(bufferedImage, "png",new File("out.png"));
when i use the code, the image file gives totally wrong outputs(out.png attached)
how do i make pdfbox take something like a direct snapshot image?
also, i noticed that the image quality of the png is not so good, is there any way to increase the resolution of the generated image?
EDIT: here is the pdf(see page number 68) https://drive.google.com/file/d/0B0ZiP71EQHz2NVZUcElvbFNreEU/edit?usp=sharing
EDIT 2: it seems that all the text isvanishing. i also tried using the PDFImageWriter class
test.writeImage(doc, "png", null, 68, 69, "final.png",TYPE_USHORT_GRAY,200 );
same result
Using PDFRenderer it is possible to convert PDF page into image formats.
Convert PDF page into image in java Using PDF Renderer. Jars Required PDFRenderer-0.9.0
package com.pdfrenderer.examples;
import java.awt.Graphics2D;
import java.awt.Image;
import java.awt.Rectangle;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import javax.imageio.ImageIO;
import com.sun.pdfview.PDFFile;
import com.sun.pdfview.PDFPage;
public class PdfToImage {
public static void main(String[] args) {
try {
String sourceDir = "C:/Documents/Chemistry.pdf";// PDF file must be placed in DataGet folder
String destinationDir = "C:/Documents/Converted/";//Converted PDF page saved in this folder
File sourceFile = new File(sourceDir);
File destinationFile = new File(destinationDir);
String fileName = sourceFile.getName().replace(".pdf", "_cover");
if (sourceFile.exists()) {
if (!destinationFile.exists()) {
destinationFile.mkdir();
System.out.println("Folder created in: "+ destinationFile.getCanonicalPath());
}
RandomAccessFile raf = new RandomAccessFile(sourceFile, "r");
FileChannel channel = raf.getChannel();
ByteBuffer buf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
PDFFile pdf = new PDFFile(buf);
int pageNumber = 62;// which PDF page to be convert
PDFPage page = pdf.getPage(pageNumber);
System.out.println("Total pages:"+ pdf.getNumPages());
// create the image
Rectangle rect = new Rectangle(0, 0, (int) page.getBBox().getWidth(), (int) page.getBBox().getHeight());
BufferedImage bufferedImage = new BufferedImage(rect.width, rect.height, BufferedImage.TYPE_INT_RGB);
// width & height, // clip rect, // null for the ImageObserver, // fill background with white, // block until drawing is done
Image image = page.getImage(rect.width, rect.height, rect, null, true, true );
Graphics2D bufImageGraphics = bufferedImage.createGraphics();
bufImageGraphics.drawImage(image, 0, 0, null);
File imageFile = new File( destinationDir + fileName +"_"+ pageNumber +".png" );// change file format here. Ex: .png, .jpg, .jpeg, .gif, .bmp
ImageIO.write(bufferedImage, "png", imageFile);
System.out.println(imageFile.getName() +" File created in: "+ destinationFile.getCanonicalPath());
} else {
System.err.println(sourceFile.getName() +" File not exists");
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
ConvertedImage:
I get the same result as the OP using PDFBox version 1.8.4. In version 2.0.0-SNAPSHOT, though, it looks better:
Here only some arrows are thinner and some arrow parts are mis-drawn as boxes.
Thus,
how do i make pdfbox take something like a direct snapshot image?
The current release versions (up to 1.8.4) seem to have greater deficits when rendering PDFs as images. You may switch to a current development version (e.g. the current trunk, 2.0.0-SNAPSHOT) or wait until the improvements are released.
Furthermore, some minor deficits are even in 2.0.0-SNAPSHOT. You might want to present your sample document to the PDFBox people (i.e. create an according issue in their JIRA) so that they improve PDFBox even further to suit your needs.
also, i noticed that the image quality of the png is not so good, is there any way to increase the resolution of the generated image?
There are convertToImage
overloads with resolution
parameters. Your current code actually sets the resolution to screenResolution
. Increase this resolution value.
PS: The code to render a PDF page to image has been refactored in 2.0.0-SNAPSHOT. Instead of
BufferedImage image = page.convertToImage();
you now do
BufferedImage image = RenderUtil.convertToImage(page);
I assume this has been done to remove direct AWT references from the core classes because AWT is not available on e.g. Android.
PS: The SNAPSHOT I used last year in this answer merely was a snapshot subject to changes. The 2.0.0 release is still under development, many things have changed. Especially there is no RenderUtil
class anymore. Instead one currently has to use the PDFRenderer
in the org.apache.pdfbox.rendering
package...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With