My PowerPoint slide has a number of group shapes in which there are child text shapes.
Earlier I was using this code, but it doesn't handle Group shapes.
for eachfile in files:
prs = Presentation(eachfile)
textrun=[]
for slide in prs.slides:
for shape in slide.shapes:
if hasattr(shape, "text"):
print(shape.text)
textrun.append(shape.text)
new_list=" ".join(textrun)
text_list.append(new_list)
I am trying to extract the text from these child text boxes. I have managed to reach these child elements using GroupShape.shape But I get an error, that these are of type 'property', so I am not able to access the text or iterate (TypeError: 'property' object is not iterable) over them.
from pptx.shapes.group import GroupShape
from pptx import Presentation
for eachfile in files:
prs = Presentation(eachfile)
textrun=[]
for slide in prs.slides:
for shape in slide.shapes:
for text in GroupShape.shapes:
print(text)
I would then like to catch the text and append to a string for further processing.
So my question is, how to access the child text elements and extract the text from them.
I have spent a lot of time going though the documentation and source code, but haven't been able to figure it out. Any help would be appreciated.
I think you need something like this:
from pptx.enum.shapes import MSO_SHAPE_TYPE
for slide in prs.slides:
# ---only operate on group shapes---
group_shapes = [
shp for shp in slide.shapes
if shp.shape_type == MSO_SHAPE_TYPE.GROUP
]
for group_shape in group_shapes:
for shape in group_shape.shapes:
if shape.has_text_frame:
print(shape.text)
A group shape contains other shapes, accessible on its .shapes
property. It does not itself have a .text
property. So you need to iterate the shapes in the group and get the text from each of those.
Note that this solution only goes one level deep. A recursive approach could be used to walk the tree depth-first and get text from groups containing groups if there were any.
Also note that not all shapes have text, so you must check the .has_text_frame
property to avoid raising an exception on, say, a picture shape.
Earlier answer misses some deeper "group in group" cases. Group shapes may contain many levels of shapes, including group shapes. Thus, in many real life cases there is a need to do a recursive search among the group shapes.
The previous answer parses only some of these (down to second layer of group shapes). But even that layer group shape may in turn contain further groups. So we need an iterative search strategy. This is best shown by reusing above code, keeping the first part:
from pptx.shapes.group import GroupShape
from pptx import Presentation
for eachfile in files:
prs = Presentation(eachfile)
textrun=[]
for slide in prs.slides:
for shape in slide.shapes:
then we need to replace the "for text in GroupShape.shapes:" test with a call for the recursive part:
textrun=checkrecursivelyfortext(slide.shapes,textrun)
and also insert a new recursive function definition of the function (like after the import statement). To make comparison easier, the inserted function is using the same code as above, only adding the recursive part:
def checkrecursivelyfortext(shpthissetofshapes,textrun):
for shape in shpthissetofshapes:
if shape.shape_type == MSO_SHAPE_TYPE.GROUP:
textrun=checkrecursivelyfortext(shape.shapes,textrun)
else:
if hasattr(shape, "text"):
print(shape.text)
textrun.append(shape.text)
return textrun
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With