I am working on python-pptx package. For my code I need to extract all the images that are present inside the presentation file. Can anybody help me through this ?
Thanks in advance for help.
my code looks like this:
import pptx
prs = pptx.Presentation(filename)
for slide in prs.slides:
for shape in slide.shapes:
print(shape.shape_type)
while using shape_type it is showing PICTURE(13) present in the ppt. But i want the pictures extracted in the folder where the code is present.
If you want to separately use files or objects from a PowerPoint presentation, such as videos, photos, or sounds, you can extract them by converting the presentation to a “zipped” file folder. Note, however, that you can't extract PDFs or . dotx files.
Here's what to do: Open the PPT file and select File > Save As. In the Save As window, select an image format from the Save As Type drop-down list. Select All Slides to export the entire PPT file or Just This One to export the selected slide.
Just copy -> paste the file in the same or different folder (you need that additional copy just to be on the safe side!) (yes, make the PowerPoint file an archive!) 5. Once you do that, right-click on the file and extract it to a folder on your machine.
A Picture
(shape) object in python-pptx
provides access to the image it displays:
from pptx import Presentation
from pptx.enum.shapes import MSO_SHAPE_TYPE
def iter_picture_shapes(prs):
for slide in prs.slides:
for shape in slide.shapes:
if shape.shape_type == MSO_SHAPE_TYPE.PICTURE:
yield shape
for picture in iter_picture_shapes(Presentation(filename)):
image = picture.image
# ---get image "file" contents---
image_bytes = image.blob
# ---make up a name for the file, e.g. 'image.jpg'---
image_filename = 'image.%s' % image.ext
with open(image_filename, 'wb') as f:
f.write(image_bytes)
Generating a unique file name is left to you as an exercise. All the other bits you need are here.
More details on the Image
object are available in the documentation here:
https://python-pptx.readthedocs.io/en/latest/api/image.html#image-objects
The solution by scanny did not work for me because I had image elements in group elements. This worked for me:
from pptx import Presentation
from pptx.enum.shapes import MSO_SHAPE_TYPE
n=0
def write_image(shape):
global n
image = shape.image
# ---get image "file" contents---
image_bytes = image.blob
# ---make up a name for the file, e.g. 'image.jpg'---
image_filename = 'image{:03d}.{}'.format(n, image.ext)
n += 1
print(image_filename)
with open(image_filename, 'wb') as f:
f.write(image_bytes)
def visitor(shape):
if shape.shape_type == MSO_SHAPE_TYPE.GROUP:
for s in shape.shapes:
visitor(s)
if shape.shape_type == MSO_SHAPE_TYPE.PICTURE:
write_image(shape)
def iter_picture_shapes(prs):
for slide in prs.slides:
for shape in slide.shapes:
visitor(shape)
iter_picture_shapes(Presentation(filename))
A PowerPoint Presentation is just a zip file. Rename the .pptx to .zip, and you have the following:
Unzip the file, locate the media folder, and get the image files from media folder, in few lines code. Done. (No need to use python-pptx, its great lib to create pptx files)
Use this PPTExtractor repo for reference.
ppt = PPTExtractor("some/PowerPointFile")
# found images
len(ppt)
# image list
images = ppt.namelist()
# extract image
ppt.extract(images[0])
# save image with different name
ppt.extract(images[0], "nuevo-nombre.png")
# extract all images
ppt.extractall()
Save images in a diferent directory:
ppt.extract("image.png", path="/another/directory")
ppt.extractall(path="/another/directory")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With