Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bug in VS13 regex: wrong order for alternatives?

I need a regex that captures an argument between parentheses. The blanks before and after the argument should not be captured. For example, "( ab & c )" should return "ab & c". The argument can be enclosed into single quotes if leading or trailing blanks are needed. So, "( ' ab & c ' )" should return " ab & c ".

wstring String = L"( ' ab & c ' )";
wsmatch Matches;
regex_match( String, Matches, wregex(L"\\(\\s*(?:'(.+)'|(.+?))\\s*\\)") );
wcout << L"<" + Matches[1].str() + L"> " + L"<" + Matches[2].str() + L">" + L"\n";
// Results in "<> < ' ab & c '>", not OK

It seems that the second alternative matched, but it also took the space in front of the first quote! It should have been caught by the \s after the opening parenthesis.

Removing the second alternative:

regex_match( String, Matches, wregex(L"\\(\\s*(?:'(.+)')\\s*\\)") );
wcout << L"<" + Matches[1].str() + L">" + L"\n";
// Results in "< ab & c >", OK

Making it a capturing group of alternatives:

regex_match( String, Matches, wregex(L"\\(\\s*('(.+)'|(.+?))\\s*\\)") );
wcout << L"<" + Matches[1].str() + L"> " + L"<" + Matches[2].str() + L"> " + L"<" + Matches[3].str() + L">" + L"\n";
// Results in "<' ab & c '> < ab & c > <> ", OK

Am I overlooking anything?

like image 380
Jan Laloux Avatar asked Apr 29 '15 13:04

Jan Laloux


1 Answers

Here is my suggestion that merges two alternatives into 1:

wstring String = L"( ' ab & c ' )";
wsmatch Matches;
regex_match( String, Matches, wregex(L"\\(\\s*(')?([^']+)\\1\\s*\\)") );
wcout << L"<" + Matches[2].str() + L"> " + L"\n";

The \(\s*(')?([^']+)\1\s*\) regex is using a back-reference to make sure we have a ' at the beginning and the end in order not to capture 'something. The value is caught into Group 2.

Output:

enter image description here

like image 55
Wiktor Stribiżew Avatar answered Nov 14 '22 06:11

Wiktor Stribiżew