I'm trying to create regex to parse markdown links.
regex:
!\[[^\]]*\]\((.*)\s"(.*[^"])"?\s*\)
Test (link to live demo):
foo
![](image 2.png "hello world")
bar
Group 1 will be image 2.png
, and group 2 will be hello world
.
The problem appears when I try to parse a link without title:
foo
![](image 2.png)
bar
How I should modify regex to make it work in both cases?
You have to make the second group optional since it's not always there. Also, you can achieve a little bit better readability with named groups, something like this perhaps:
!\[[^\]]*\]\((?<filename>.*?)(?=\"|\))(?<optionalpart>\".*\")?\)
https://regex101.com/r/cSbfvF/3/
Alternatively, your original regex fixed up would be:
!\[[^\]]*\]\((.*?)\s*("(?:.*[^"])")?\s*\)
https://regex101.com/r/u2DwY2/2/
Here's a complete regexp to match both the Alt text and the image url in a markdown file with a named capture group:
(?<alt>!\[[^\]]*\])\((?<filename>.*?)(?=\"|\))\)
The previously accepted answer only accounts for standard images, it's possible however that images could be used as links for hyperlinks, resulting in a nested image reference, such as:
![alt-text](http://example.com/image.png "image title")](http://example.com/some?target)
A more complete regex pattern would like like this:
\[?(!)(?'alt'\[[^\]\[]*\[?[^\]\[]*\]?[^\]\[]*)\]\((?'url'[^\s]+?)(?:\s+(["'])(?'title'.*?)\4)?\)
This pattern also provides named groups for all the potential other info you might want about the image, such as "alt text" or "title".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With