In http://llvm.org/svn/llvm-project/libcxx/trunk/test/re/re.alg/re.alg.match/ecma.pass.cpp, the following test exists:
std::cmatch m;
const char s[] = "tournament";
assert(!std::regex_match(s, m, std::regex("tour|to|tournament")));
assert(m.size() == 0);
Why should this match be failed?
On VC++2012 and boost, the match succeeds.
On Javascript of Chrome and Firefox, "tournament".match(/^(?:tour|to|tournament)$/)
succeeds.
Only on libc++, the match fails.
I believe the test is correct. It is instructive to search for "tournament" in all of the libc++ tests under re.alg, and compare how the different engines treat the regex("tour|to|tournament")
, and how regex_search
differs from regex_match
.
Let's start with regex_search
:
awk, egrep, extended:
regex_search("tournament", m, regex("tour|to|tournament"))
matches the entire input string: "tournament".
ECMAScript:
regex_search("tournament", m, regex("tour|to|tournament"))
matches only part of the input string: "tour".
grep, basic:
regex_search("tournament", m, regex("tour|to|tournament"))
Doesn't match at all. The '|' character is not special.
awk, egrep and extended will match as much as they can with alternation. However the ECMAScript alternation is "ordered". This is specified in ECMA-262. Once ECMAScript matches a branch in the alternation, it quits searching. The standard includes this example:
/a|ab/.exec("abc")
returns the result "a" and not "ab".
<plug>
This is also discussed in depth in Mastering Regular Expressions by Jeffrey E.F. Friedl. I couldn't have implemented <regex>
without this book. And I will freely admit that there is still much more that I don't know about regular expressions, than what I know.
At the end of the chapter on alternation the author states:
If you understood everything in this chapter the first time you read it, you probably didn't read it in the first place.
Believe it!
</plug>
Anyway, ECMAScript matches only "tour". The regex_match
algorithm returns success only if the entire input string is matched. Since only the first 4 characters of the input string are matched, then unlike awk, egrep and extended, ECMAScript returns false with a zero-sized cmatch
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With