Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to find named capturing groups with Go programming language

Tags:

regex

go

I'm looking for a regex to find named capturing groups in (other) regex strings.

Example: I want to find (?P<country>m((a|b).+)n), (?P<city>.+) and (?P<street>(5|6)\. .+) in the following regex:

/(?P<country>m((a|b).+)n)/(?P<city>.+)/(?P<street>(5|6)\. .+)

I tried the following regex to find the named capturing groups:

var subGroups string = `(\(.+\))*?`
var prefixedSubGroups string = `.+` + subGroups
var postfixedSubGroups string = subGroups + `.+`
var surroundedSubGroups string = `.+` + subGroups + `.+`
var capturingGroupNameRegex *regexp.RichRegexp = regexp.MustCompile(
    `(?U)` + 
    `\(\?P<.+>` + 
    `(` +   prefixedSubGroups + `|` + postfixedSubGroups + `|` + surroundedSubGroups + `)` + 
    `\)`) 

?U makes greedy quantifiers(+ and *) non-greedy, and non-greedy quantifiers (*?) greedy. Details in the Go regex documentation.

But it doesn't work because parenthesis are not matched correctly.

like image 817
deamon Avatar asked Nov 11 '12 10:11

deamon


1 Answers

Matching arbitrarily nested parentheses correctly is not possible with regular expressions because arbitrary (recursive) nesting cannot be described by a regular language.

Some modern regex flavor do support recursion (Perl, PCRE) or balanced matching (.NET), but Go is not one of them (the docs explicitly say that Perl's (?R) construct is not supported by the RE2 library that Go's regex package appears to be based on). You need to build a recursive descent parser, not a regex.

like image 55
Tim Pietzcker Avatar answered Oct 05 '22 23:10

Tim Pietzcker