How to extract markdown links with a regex?

Question

I currently have the Python code for parsing markdown text in order to extract the content inside the square brackets of a markdown link along with the hyperlink.

import re

# Extract []() style links
link_name = "[^]]+"
link_url = "http[s]?://[^)]+"
markup_regex = f'\[({link_name})]$\s*({link_url})\s*$'

for match in re.findall(markup_regex, '[a link](https://www.wiki.com/atopic_(subtopic))'):
    name = match[0]
    url = match[1]
    print(url)
    # url will be https://wiki.com/atopic_(subtopic

This will fail to grab the proper link because it matches up to the first bracket, rather than the last one.

How can I make the regex respect up till the final bracket?

Jan · Accepted Answer

For those types of urls, you'd need a recursive approach which only the newer regex module supports:

import regex as re

data = """
It's very easy to make some words **bold** and other words *italic* with Markdown. 
You can even [link to Google!](http://google.com)
[a link](https://www.wiki.com/atopic_(subtopic))
"""

pattern = re.compile(r'$$([^][]+)$$($((?:[^()]+|(?2))+)$)')

for match in pattern.finditer(data):
    description, _, url = match.groups()
    print(f"{description}: {url}")

This yields

link to Google!: http://google.com
a link: https://www.wiki.com/atopic_(subtopic)

See a demo on regex101.com.

This cryptic little beauty boils down to

$$([^][]+)$$           # capture anything between "[" and "]" into group 1
($                    # open group 2 and match "("
    ((?:[^()]+|(?2))+) # match anything not "(" nor ")" or recurse group 2
                       # capture the content into group 3 (the url)
$)                    # match ")" and close group 2

NOTE: The problem with this approach is that it fails for e.g. urls like

[some nasty description](https://google.com/()
#                                          ^^^

which are surely totally valid in Markdown. If you're to encounter any such urls, use a proper parser instead.

How to extract markdown links with a regex?

Tags:

python

regex

python-re

James Bradbury

1 Answers

Jan

Recent Activity

Donate For Us

How to extract markdown links with a regex?

Tags:

python

regex

python-re

James Bradbury

1 Answers

Jan

Related questions

Recent Activity

Donate For Us