Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex negative look ahead to match markdown links

We are stuck over a regex issue.

Here is the problem. Consider the following two patterns:

1) [hello] [world]

2) [hello [world]]

We need to write a regex able to match only [world] in the first one and the entire pattern ([hello [world]]) in the second.

By using the negative lookahead, I wrote the following regex which solves part of the problem:

\[[^\[\]]+\](?!.*\[[^\[\]]+\])

This regex matches pattern 1) as we want, but does not work for pattern 2).

like image 883
Enrico Massone Avatar asked Oct 30 '22 00:10

Enrico Massone


1 Answers

In .NET regex, you may use balanced groups to match nested balanced parentheses. So, to match the last [...] substring (with nested parentheses) on a line you need quite a long pattern like

\[(?:[^][]+|(?<c>)\[|(?<-c>)])*(?(c)(?!))](?!.*\[(?:[^][]+|(?<d>)\[|(?<-d>)])*(?(d)(?!))])

See the regex demo at RegexStorm.net.

Details

  • \[(?:[^][]+|(?<c>)\[|(?<-c>)])*(?(c)(?!))] - a [...] substring with nested brackets:
    • \[ - a [ char
    • (?:[^][]+|(?<c>)\[|(?<-c>)])* - zero or more occurrences of:
      • [^][]+| - 1 or more chars other than ] and [ or
      • (?<c>)\[| - empty value added to Group "c" and a [ is matched
      • (?<-c>)] - empty value is subtracted from Group "c" stack and a ] is matched
    • (?(c)(?!)) - a conditional that fails the match if Group "c" stack is not empty
    • ] - a ] char
  • (?!.*\[(?:[^][]+|(?<d>)\[|(?<-d>)])*(?(d)(?!))]) - not followed with any 0+ chars other than newline symbols followed with the same pattern as the one above.
like image 117
Wiktor Stribiżew Avatar answered Nov 15 '22 08:11

Wiktor Stribiżew