Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PDF to Image conversion [duplicate]

Possible Duplicate:
Convert pdf file to jpg asp.net

public class Pdf2Image {

    private Image image;
    int length;
    public int convertPdf2Image(String pdfname) {
        File file = new File(pdfname);
        RandomAccessFile raf;
        try {
            raf = new RandomAccessFile(file, "r");
            FileChannel channel = raf.getChannel();
            ByteBuffer buf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
            PDFFile pdffile = new PDFFile(buf);
            // draw the first page to an image
            int num = pdffile.getNumPages();

            length=num;
            for (int i = 0; i <= num; i++) {
                PDFPage page = pdffile.getPage(i);
                //get the width and height for the doc at the default zoom
                int width = (int) page.getBBox().getWidth();
                int height = (int) page.getBBox().getHeight();
                Rectangle rect = new Rectangle(0, 0, width, height);
                int rotation = page.getRotation();
                Rectangle rect1 = rect;
                if (rotation == 90 || rotation == 270) {
                    rect1 = new Rectangle(0, 0, rect.height, rect.width);
                }
                //generate the image
                BufferedImage img = (BufferedImage) page.getImage(
                        rect.width, rect.height, //width & height
                        rect1, // clip rect
                        null, // null for the ImageObserver
                        true, // fill background with white
                        true // block until drawing is done
                        );
                ImageIO.write(img, "png", new File("src\\downloadedFiles\\aa" + i + ".png"));
            }
        } catch (FileNotFoundException e1) {
            System.err.println(e1.getLocalizedMessage());
        } catch (IOException e) {
            System.err.println(e.getLocalizedMessage());
        }
        return length;
    }

    public static void main(String[] args) {
        Pdf2Image p = new Pdf2Image();
        p.convertPdf2Image("src\\downloadedFiles\\todaypdf.pdf");
    }
}

I am using this code to convert PDF file to image. It is working fine for most of the PDF's but showing exception for a PDF file. Exception is:

Expected 'xref' at start of table.

Could any one tell me why it is giving such an exception?

like image 807
A B Avatar asked May 28 '11 06:05

A B


1 Answers

There are many malformed PDF files out in the wild and this is most likely one of them.

It is not possible to give a definite answer until seeing the problem PDF file. What I am guessing is that the 'startxref' specifies an absolute position into the PDF where the xref table should be located. The java library is jumping to this position on the file expecting to find the word 'xref' but cannot find it.

http://blog.amyuni.com/?p=1627

One way to fix this would be to load the file into the full version of Acrobat and then save the file. Acrobat will fix the xref offset as mentioned in the link.

There are quite large companies out there generating malformed PDF's that should know better. Adobe lets these files exist because it makes it hard for their PDF competitors to keep up and compete.

like image 199
Andrew Cash Avatar answered Oct 19 '22 16:10

Andrew Cash