Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Go ReplaceAllString

Tags:

regex

go

I read the example code from golang.org website. Essentially the code looks like this:

re := regexp.MustCompile("a(x*)b")
fmt.Println(re.ReplaceAllString("-ab-axxb-", "T"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1W"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "${1}W"))

The output is like this:

-T-T-
--xx-
---
-W-xxW-

I understand the first output, but I don't understand the the rest three. Can someone explain to me the results 2,3 and 4. Thanks.

like image 968
Qian Chen Avatar asked Jan 08 '16 09:01

Qian Chen


2 Answers

The most intriguing is the fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1W")) line. The docs say:

Inside repl, $ signs are interpreted as in Expand

And Expand says:

In the template, a variable is denoted by a substring of the form $name or ${name}, where name is a non-empty sequence of letters, digits, and underscores. A reference to an out of range or unmatched index or a name that is not present in the regular expression is replaced with an empty slice.

In the $name form, name is taken to be as long as possible: $1x is equivalent to ${1x}, not ${1}x, and, $10 is equivalent to ${10}, not ${1}0.

So, in the 3rd replacement, $1W is treated as ${1W} and since this group is not initialized, an empty string is used for replacement.

When I say "the group is not initialized", I mean to say that the group is not defined in the regex pattern, thus, it was not populated during the match operation. Replacing means getting all matches and then they are replaced with the replacement pattern. Backreferences ($xx constructs) are populated during the matching phase. The $1W group is missing in the pattern, thus, it was not populated during matching, and only an empty string is used when replacing phase occurs.

The 2nd and 4th replacements are easy to understand and have been described in the above answers. Just $1 backreferences the characters captured with the first capturing group (the subpattern enclosed with a pair of unescaped parentheses), same is with Example 4.

You can think of {} as a means to disambiguate the replacement pattern.

Now, if you need to make the results consistent, use a named capture (?P<1W>....):

re := regexp.MustCompile("a(?P<1W>x*)b")  // <= See here, pattern updated
fmt.Println(re.ReplaceAllString("-ab-axxb-", "T"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1W"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "${1}W"))

Results:

-T-T-
--xx-
--xx-
-W-xxW-

The 2nd and 3rd lines now produce consistent output since the named group 1W is also the first group, and $1 numbered backreference points to the same text captured with a named capture $1W.

like image 129
Wiktor Stribiżew Avatar answered Sep 21 '22 12:09

Wiktor Stribiżew


$number or $name is index of subgroup in regex or subgroup name

fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1"))

$1 is subgroup 1 in regex = x*

fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1W"))

$1W no subgroup name 1W => Replace all with null

fmt.Println(re.ReplaceAllString("-ab-axxb-", "${1}W"))

$1 and ${1} is the same. replace all subgroup 1 with W

for more information : https://golang.org/pkg/regexp/

like image 34
trquoccuong Avatar answered Sep 19 '22 12:09

trquoccuong