I'm trying to craft a regular expression that will match something like this:
[[uid::page name|page alias]]
for example:
[[nw::Home|Home page]]
The uid and page alias are both optional.
I want to allow the delimiters :: or | to appear only once, and only in the order shown. However, the character : should be allowed anywhere after the uid. Herein lies the problem.
The following regex works pretty well, except that it matches strings where :: appears twice, or in the wrong place:
regex = r'\[\[([\w]+::)?([^|\t\n\r\f\v]+)(\|[^|\t\n\r\f\v]+)?\]\]'
re.match(regex, '[[Home]]') # matches, good
re.match(regex, '[[Home|Home page]]') # matches, good
re.match(regex, '[[nw::Home]]') # matches, good
re.match(regex, '[[nw::Home|Home page]]') # matches, good
re.match(regex, '[[nw|Home|Home page]]') # doesn't match, good
re.match(regex, '[[nw|Home::Home page]]') # matches, bad
re.match(regex, '[[nw::Home::Home page]]') # matches, bad
I have read all about negative lookahead and lookbehind expressions but I can't figure out how to apply them in this case. Any suggestions would be appreciated.
Edit: I would also like to know how to prevent the delimiters from being included in the match results as shown here:
('nw::', 'Home', '|Home page')
If I understand your needs correctly, you could use this:
\[\[(?:(?<uid>\w+)::)?(?!.*::)(?<page>[^|\t\n\r\f\v]+)(?:\|(?<alias>[^|\t\n\r\f\v]+))?\]\]
^^^^^^^^
See here for a demo. I added a negative lookahead after the uid capture.
I have given names to the captured groups but if you don't want them, that's the one without named captured groups:
\[\[(?:(\w+)::)?(?!.*::)([^|\t\n\r\f\v]+)(?:\|([^|\t\n\r\f\v]+))?\]\]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With