I'm looking for a regex to find named capturing groups in (other) regex strings.
Example: I want to find (?P<country>m((a|b).+)n)
, (?P<city>.+)
and (?P<street>(5|6)\. .+)
in the following regex:
/(?P<country>m((a|b).+)n)/(?P<city>.+)/(?P<street>(5|6)\. .+)
I tried the following regex to find the named capturing groups:
var subGroups string = `(\(.+\))*?`
var prefixedSubGroups string = `.+` + subGroups
var postfixedSubGroups string = subGroups + `.+`
var surroundedSubGroups string = `.+` + subGroups + `.+`
var capturingGroupNameRegex *regexp.RichRegexp = regexp.MustCompile(
`(?U)` +
`\(\?P<.+>` +
`(` + prefixedSubGroups + `|` + postfixedSubGroups + `|` + surroundedSubGroups + `)` +
`\)`)
?U
makes greedy quantifiers(+
and *
) non-greedy, and non-greedy quantifiers (*?
) greedy. Details in the Go regex documentation.
But it doesn't work because parenthesis are not matched correctly.
Matching arbitrarily nested parentheses correctly is not possible with regular expressions because arbitrary (recursive) nesting cannot be described by a regular language.
Some modern regex flavor do support recursion (Perl, PCRE) or balanced matching (.NET), but Go is not one of them (the docs explicitly say that Perl's (?R)
construct is not supported by the RE2 library that Go's regex package appears to be based on). You need to build a recursive descent parser, not a regex.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With