Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Paste PDF image into Pyplot figure

How can I plot the image from a PDF file into a Pyplot figure (e.g. with plt.imshow, or inside some container I can add with ax.add_artist)?


Methods that do not work:

import matplotlib.pyplot as plt
im = plt.imread('file.pdf')

(Source: this question, where it works for PNG files.)

from PIL import Image
im = Image.open('file.pdf')

(Source: this doc, but again, it doesn't work for PDF files; the question links a library to read PDFs but the doc shows no obvious way to add them to a Pyplot figure.)

Also, this question exists, but the answers solve the problem without actually loading a PDF file.

like image 507
cheersmate Avatar asked Sep 17 '18 14:09

cheersmate


1 Answers

There is a module called PyMuPDF that makes this job a lot easier.

Scraping PDF images into PIL Image

  1. To scrape the individual images out of each page tutorials can be found here and here on how to convert them into PIL format.

  2. If the intention is to grab an entire PDF page or pages, the page.get_pixmap() documented here, can do this.

The snippet below shows how to iterate through and grab each page of a PDF as a PIL.Image

import io
import fitz
from PIL import Image

file = 'myfile.pdf'
pdf_file = fitz.open(file)

# in case there is a need to loop through multiple PDF pages
for page_number in range(len(pdf_file)):
    page = pdf_file[page_number]
    rgb = page.get_pixmap()
    pil_image = Image.open(io.BytesIO(rgb.tobytes()))

    # display code or image manipulation here for each page #

Displaying scraped PDF Image

In either case, once there is a PIL.Image object, such as the pil_image variable above, the show() function can display it (and does so differently depending on the OS). However, if the preference is to use matplotlib.pyplot.imshow the PIL.Image must be converted to RGB first.

Snippet to display PIL.Image with pyplot.imshow

import matplotlib.pyplot as plt

plt.imshow(pil_image.convert('RGB'))
like image 195
lane Avatar answered Feb 11 '23 07:02

lane