Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python PIL can't open PDFs for some reason

So my program is able to open PNGs but not PDFs, so I made this just to test, and it still isn't able to open even a simple PDF. And I don't know why.

from PIL import Image

with Image.open(r"Adams, K\a.pdf") as file:
    print file

Traceback (most recent call last):
  File "C:\Users\Hayden\Desktop\Scans\test4.py", line 3, in <module>
    with Image.open(r"Adams, K\a.pdf") as file:
  File "C:\Python27\lib\site-packages\PIL\Image.py", line 2590, in open
    % (filename if filename else fp))
IOError: cannot identify image file 'Adams, K\\a.pdf'

After trying PyPDF2 as suggested (Thanks for the link by the way), I am getting this error with my code. import PyPDF2

pdf_file= open(r"Adams, K (6).pdf", "rb")
read_pdf= PyPDF2.PdfFileReader(pdf_file)

number_of_pages = read_pdf.getNumPages()
print number_of_pages


Xref table not zero-indexed. ID numbers for objects will be corrected. [pdf.py:1736]
like image 805
Hayden Avatar asked Jun 26 '18 17:06

Hayden


People also ask

Does PIL support PDF?

This is currently supported for GIF, PDF, PNG, TIFF, and WebP.

How do I open a PDF file in Python?

Use the PyPDF2 Module to Read a PDF in Python We open the PDF document in read binary mode using open('document_path. PDF', 'rb') . PDFFileReader() is used to create a PDF reader object to read the document. We can extract text from the pages of the PDF document using getPage() and extractText() methods.

Can Python work with PDF files?

You can work with a preexisting PDF in Python by using the PyPDF2 package. PyPDF2 is a pure-Python package that you can use for many different types of PDF operations.


2 Answers

Following this article: https://www.geeksforgeeks.org/convert-pdf-to-image-using-python/ you can use the pdf2image package to convert the pdf to a PIL object.

This should solve your problem:

from pdf2image import convert_from_path

fname = r"Adams, K\a.pdf"
pil_image_lst = convert_from_path(fname) # This returns a list even for a 1 page pdf
pil_image = pil_image_lst[0]

I just tried this out with a one page pdf.

like image 126
Alexander Avatar answered Sep 28 '22 02:09

Alexander


As pointed out by @Kevin (see comment below) PIL has support for writing pdfs but not reading them.

To read a pdf you will need some other library. You can look here which is a tutorial for handling PDFs with PyPDF2.

https://pythonhosted.org/PyPDF2/?utm_source=recordnotfound.com

like image 35
Xantium Avatar answered Sep 28 '22 02:09

Xantium