Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract hyperlink from pptx

I want to extract the hyperlink from pptx, I know how to do it in word, but anyone knows how to extract it from pptx?

For example, I have a text below in pptx and I want to get the url https://stackoverflow.com/ :


Hello, stackoverflow


I tried to write the Python code to get the text:

from pptx import Presentation
from pptx.opc.constants import RELATIONSHIP_TYPE as RT

ppt = Presentation('data/ppt.pptx')

for i, sld in enumerate(ppt.slides, start=1):
    print(f'-- {i} --')
    for shp in sld.shapes:
        if shp.has_text_frame:
            print(shp.text)

But I just want to print the text and the URL when the text with hyperlink.

like image 660
Z.L Avatar asked Feb 14 '26 23:02

Z.L


1 Answers

In python-pptx, a hyperlink can appear on a Run, which I believe is what you're after. Note that this means zero-or-more hyperlinks can appear in a given shape. Note also that a hyperlink can also appear on an overall shape, such that clicking on the shape follows the link. In that case, the text of the URL does not appear.

from pptx import Presentation

prs = Presentation('data/ppt.pptx')

for slide in prs.slides:
    for shape in slide.shapes:
        if not shape.has_text_frame:
            continue
        for paragraph in shape.text_frame.paragraphs:
            for run in paragraph.runs:
                address = run.hyperlink.address
                if address is None:
                    continue
                print(address)

The relevant sections of the documentation are here:
https://python-pptx.readthedocs.io/en/latest/api/text.html#run-objects

and here:
https://python-pptx.readthedocs.io/en/latest/api/action.html#hyperlink-objects

like image 66
scanny Avatar answered Feb 16 '26 13:02

scanny



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!