I am using PDFBox 2.0. While parsing a PDF document, I also want to get first page as image and store it to hbase for using it in search results(I am going to create a search list page like search page of amazon.com).
HBase accepts byte[] variable to store(index) a value. I need to convert the image as byte[], then store it to HBase. I have implemented image render, but how can I convert it to byte[]?
PDDocument document = PDDocument.load(file, "");
BufferedImage image = null;
try {
PDFRenderer pdfRenderer = new PDFRenderer(document);
if (document.isEncrypted()) {
try {
System.out.println("Trying to decrypt...);
document.setAllSecurityToBeRemoved(true);
System.out.println("The file has been decrypted in .");
}
catch (Exception e) {
throw new Exception("cannot be decrypted. ", e);
}
}
PDPage firstPage = (PDPage) document.getDocumentCatalog().getPages().get(0);
pdfRenderer.renderImageWithDPI(0, 300, ImageType.RGB);
// 0 means first page.
image = pdfRenderer.renderImageWithDPI(0, 300, ImageType.RGB);
document.close();
} catch (Exception e) {
e.printStackTrace();
}
If I write ImageIOUtil.writeImage(image , fileName+".jpg" ,300); above right above document.close();, program creates a jpg file in project path. I need to put it in a byte[] array instead of creating a file. Is it possible?
This can be done with ImageIO.write(Image, String, OutputStream) which can write to an arbitrary OutputStream rather than disk. ByteArrayOutputStream can store the output bytes into an array in memory.
import java.io.ByteArrayOutputStream;
...
// example image
BufferedImage image = new BufferedImage(4, 3, BufferedImage.TYPE_INT_ARGB);
// to array
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ImageIO.write(image, "jpg", bos);
byte [] output = bos.toByteArray();
System.out.println(Arrays.toString(output));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With