Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex_replace, why does it lose the $1?

string s = " 'I'd go.' ";
s = std::regex_replace(s, std::regex("((^| )')|('($| ))"), "$1(Quotation, )");
cout << s; // '(Quotation, )I'd go.(Quotation, )

I want to replace the ' with (Quotation, ), and I don't want to lose the original '. So, I use $1 to mean the original '. And I don't want to replace the ' of I'd.

^ means if the ' is at the start of the string it would be replaced. $ means the end of the string.

The result is supposed to be:

'(Quotation, )I'd go.' (Quotation, )

But actually the result is

'(Quotation, )I'd go.(Quotation, )

The left quotation replacement works fine, but the right loses the '. Why?

like image 210
Zhang Avatar asked Jun 28 '18 07:06

Zhang


Video Answer


2 Answers

It happens because the ' at the end of the string is captured in Group 3:

((^| )')|('($| ))
|| 2 |   |
|  1   | | | 4 |
         |  3   |

You may refer to each of the groups with $1, $2, $3 and $4, and more, you may even refer to the whole match using $& replacement backreferences.

So adding $3 can solve the issue:

s = std::regex_replace(s, std::regex("((^| )')|('($| ))"), "$1$3(Quotation, )");
// =>  '(Quotation, )I'd go.' (Quotation, )

See the C++ demo

An alternative solution might look like

s = std::regex_replace(s, std::regex("(?:^|\\s)'|'(?!\\S)"), "$&(Quotation, )");

The (?:^|\s)'|'(?!\S) regex matches

  • (?:^|\s)' - start of string or a whitespace char and a ' after them
  • | - or
  • '(?!\S) - a ' that is followed with a whitespace or end of string.

The $& inserts the match back into the result upon a replacement. See this regex demo online (do not pay attention at the replacement there, the site does not support $& backreference).

NOTE: If you are using the latest compiler, you may use raw string literals when defining regexps, R"((?:^|\\s)'|'(?!\\S))".

like image 53
Wiktor Stribiżew Avatar answered Oct 13 '22 00:10

Wiktor Stribiżew


You don't need several 'or's in your regex. Try this: ^\s*(').*(')\s*$ and replace the backreferences.

like image 21
PVRT Avatar answered Oct 13 '22 00:10

PVRT