extract images from pdf using pdfbox

Tags:

I m trying to extract images from a pdf using pdfbox. The example pdf here

But i m getting blank images only.

The code i m trying:-

public static void main(String[] args) {    PDFImageExtract obj = new PDFImageExtract();     try {         obj.read_pdf();     } catch (IOException ex) {         System.out.println("" + ex);     }  }   void read_pdf() throws IOException {     PDDocument document = null;      try {         document = PDDocument.load("C:\\Users\\Pradyut\\Documents\\MCS-034.pdf");     } catch (IOException ex) {         System.out.println("" + ex);     }     List pages = document.getDocumentCatalog().getAllPages();     Iterator iter = pages.iterator();      int i =1;     String name = null;      while (iter.hasNext()) {         PDPage page = (PDPage) iter.next();         PDResources resources = page.getResources();         Map pageImages = resources.getImages();         if (pageImages != null) {              Iterator imageIter = pageImages.keySet().iterator();             while (imageIter.hasNext()) {                 String key = (String) imageIter.next();                 PDXObjectImage image = (PDXObjectImage) pageImages.get(key);                 image.write2file("C:\\Users\\Pradyut\\Documents\\image" + i);                 i ++;             }         }     }  }

Thanks

247

asked Jan 02 '12 20:01

Pradyut Bhattacharya

2 Answers

Here is code using PDFBox 2.0.1 that will get a list of all images from the PDF. This is different than the other code in that it will recurse through the document instead of trying to get the images from the top level.

public List<RenderedImage> getImagesFromPDF(PDDocument document) throws IOException {         List<RenderedImage> images = new ArrayList<>();     for (PDPage page : document.getPages()) {         images.addAll(getImagesFromResources(page.getResources()));     }      return images; }  private List<RenderedImage> getImagesFromResources(PDResources resources) throws IOException {     List<RenderedImage> images = new ArrayList<>();      for (COSName xObjectName : resources.getXObjectNames()) {         PDXObject xObject = resources.getXObject(xObjectName);          if (xObject instanceof PDFormXObject) {             images.addAll(getImagesFromResources(((PDFormXObject) xObject).getResources()));         } else if (xObject instanceof PDImageXObject) {             images.add(((PDImageXObject) xObject).getImage());         }     }      return images; }

200

answered Sep 24 '22 23:09

Matt

The below GetImagesFromPDF java class get all images in 04-Request-Headers.pdf file and save those files into destination folder PDFCopy.

import java.io.File; import java.util.Iterator; import java.util.List; import java.util.Map;  import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.pdmodel.PDPage; import org.apache.pdfbox.pdmodel.PDResources; import org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage;  @SuppressWarnings({ "unchecked", "rawtypes", "deprecation" }) public class GetImagesFromPDF {     public static void main(String[] args) {         try {             String sourceDir = "C:/PDFCopy/04-Request-Headers.pdf";// Paste pdf files in PDFCopy folder to read             String destinationDir = "C:/PDFCopy/";             File oldFile = new File(sourceDir);             if (oldFile.exists()) {             PDDocument document = PDDocument.load(sourceDir);              List<PDPage> list = document.getDocumentCatalog().getAllPages();              String fileName = oldFile.getName().replace(".pdf", "_cover");             int totalImages = 1;             for (PDPage page : list) {                 PDResources pdResources = page.getResources();                  Map pageImages = pdResources.getImages();                 if (pageImages != null) {                      Iterator imageIter = pageImages.keySet().iterator();                     while (imageIter.hasNext()) {                         String key = (String) imageIter.next();                         PDXObjectImage pdxObjectImage = (PDXObjectImage) pageImages.get(key);                         pdxObjectImage.write2file(destinationDir + fileName+ "_" + totalImages);                         totalImages++;                     }                 }             }         } else {             System.err.println("File not exists");         }     } catch (Exception e) {         e.printStackTrace();     } }

}

answered Sep 24 '22 23:09

UdayKiran Pulipati

Related questions
                            
                                Is there any way of making IntelliJ IDEA recognizing Dagger 2 generated classes in a Java project?
                            
                                How to convert an Image to base64 string in java? [duplicate]
                            
                                Does the Project Lombok @Data annotation create a constructor of any kind?
                            
                                How to check if a session is invalid
                            
                                Is it a good idea to use unicode symbols as Java identifiers?
                            
                                JAXB - Ignore element
                            
                                "Closing" a blocking queue
                            
                                Changing the default session timeout of a spring web application
                            
                                How to get a JavaDoc of a method at run time?
                            
                                Android view turn off implicit state retaining for some view
                            
                                How to change method behaviour through reflection?
                            
                                Spring Prototype scoped bean in a singleton
                            
                                How to map two JPA or Hibernate entities on the same database table
                            
                                What does --> mean in Java? [duplicate]
                            
                                Java generics, Unbound wildcards <?> vs <Object>
                            
                                Auto-generate Javadoc comments in intelliJ? [duplicate]
                            
                                How do you convert binary data to Strings and back in Java?
                            
                                In eclipse, how to display inherited members in Outline view?
                            
                                What is the difference between double a = a + int b and int a += double b?
                            
                                What is Jython and is it useful at all? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

extract images from pdf using pdfbox

Tags:

java

image

pdf

pdfbox

Pradyut Bhattacharya

People also ask

2 Answers

Matt

UdayKiran Pulipati

Recent Activity

Donate For Us