Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting images from presentation file

I am working on python-pptx package. For my code I need to extract all the images that are present inside the presentation file. Can anybody help me through this ?

Thanks in advance for help.

my code looks like this:

import pptx

prs = pptx.Presentation(filename)

for slide in prs.slides:
    for shape in slide.shapes:
        print(shape.shape_type)

while using shape_type it is showing PICTURE(13) present in the ppt. But i want the pictures extracted in the folder where the code is present.

like image 807
Auro Avatar asked Sep 25 '18 06:09

Auro


People also ask

Can you pull images from PowerPoint?

If you want to separately use files or objects from a PowerPoint presentation, such as videos, photos, or sounds, you can extract them by converting the presentation to a “zipped” file folder. Note, however, that you can't extract PDFs or . dotx files.

Is there a way to export all images from PowerPoint?

Here's what to do: Open the PPT file and select File > Save As. In the Save As window, select an image format from the Save As Type drop-down list. Select All Slides to export the entire PPT file or Just This One to export the selected slide.

How do I extract objects from PowerPoint?

Just copy -> paste the file in the same or different folder (you need that additional copy just to be on the safe side!) (yes, make the PowerPoint file an archive!) 5. Once you do that, right-click on the file and extract it to a folder on your machine.


4 Answers

A Picture (shape) object in python-pptx provides access to the image it displays:

from pptx import Presentation
from pptx.enum.shapes import MSO_SHAPE_TYPE

def iter_picture_shapes(prs):
    for slide in prs.slides:
        for shape in slide.shapes:
            if shape.shape_type == MSO_SHAPE_TYPE.PICTURE:
                yield shape

for picture in iter_picture_shapes(Presentation(filename)):
    image = picture.image
    # ---get image "file" contents---
    image_bytes = image.blob
    # ---make up a name for the file, e.g. 'image.jpg'---
    image_filename = 'image.%s' % image.ext
    with open(image_filename, 'wb') as f:
        f.write(image_bytes)

Generating a unique file name is left to you as an exercise. All the other bits you need are here.

More details on the Image object are available in the documentation here:
https://python-pptx.readthedocs.io/en/latest/api/image.html#image-objects

like image 141
scanny Avatar answered Oct 13 '22 02:10

scanny


The solution by scanny did not work for me because I had image elements in group elements. This worked for me:

from pptx import Presentation
from pptx.enum.shapes import MSO_SHAPE_TYPE

n=0
def write_image(shape):
    global n
    image = shape.image
    # ---get image "file" contents---
    image_bytes = image.blob
    # ---make up a name for the file, e.g. 'image.jpg'---
    image_filename = 'image{:03d}.{}'.format(n, image.ext)
    n += 1
    print(image_filename)
    with open(image_filename, 'wb') as f:
        f.write(image_bytes)

def visitor(shape):
    if shape.shape_type == MSO_SHAPE_TYPE.GROUP:
        for s in shape.shapes:
            visitor(s)
    if shape.shape_type == MSO_SHAPE_TYPE.PICTURE:
        write_image(shape)

def iter_picture_shapes(prs):
    for slide in prs.slides:
        for shape in slide.shapes:
            visitor(shape)

iter_picture_shapes(Presentation(filename))
like image 41
Jason Furtney Avatar answered Oct 13 '22 03:10

Jason Furtney


A PowerPoint Presentation is just a zip file. Rename the .pptx to .zip, and you have the following:

enter image description here

Unzip the file, locate the media folder, and get the image files from media folder, in few lines code. Done. (No need to use python-pptx, its great lib to create pptx files)

like image 38
Coder Dev Avatar answered Oct 13 '22 03:10

Coder Dev


Use this PPTExtractor repo for reference.

ppt = PPTExtractor("some/PowerPointFile")
# found images
len(ppt)
# image list
images = ppt.namelist()
# extract image
ppt.extract(images[0])

# save image with different name
ppt.extract(images[0], "nuevo-nombre.png")
# extract all images
ppt.extractall()

Save images in a diferent directory:

ppt.extract("image.png", path="/another/directory")
ppt.extractall(path="/another/directory")
like image 1
Aravind Avatar answered Oct 13 '22 02:10

Aravind