NOTE: When I say the regex [\0]
I mean the regex [\0]
(not contained in a C-style string, which would then be "[\\0]"
). If I haven't put quotes around it, it's not a C-style string, and the backslashes shouldn't be interpreted as escaping a C-style string.
Inspired by this question and my investigation, I tried the following code in clang 3.4:
#include <regex>
#include <string>
int main()
{
std::string input = "foobar";
std::regex regex("[^\\0]*"); // Note, this is "\\0", not "\0"!
return std::regex_match(input, regex);
}
Apparently, clang doesn't like this, as it throws:
std::__1::regex_error
: The expression contained an invalid escaped character, or a trailing escape.
It seems to be the [^\0]
part (changing it to [^\n]
or something similar works fine). It seems to be an invalid escape character. I want to clarify that I'm not talking about the '\0'
character (null-character) or '\n'
character (newline character). In C-style strings, what I'm talking about is "\\0"
(a string containing backslash zero) and "\\n"
(a string containing backslash n). "\\n"
seems to get transformed into "\n"
by the regex engine, but it chokes on "\\0"
.
The C++11 standard says in section 28.13 [re.grammar] that:
The regular expression grammar recognized by
basic_regex
objects constructed with the ECMAScript flag is that specified by ECMA-262, except as specified below.
I'm no expert on ECMA-262, but I tried the regular expression on JSFiddle and it's working fine there in JavaScript land.
So now I'm wondering if the regex [^\0]
is valid in ECMA-262 and the C++11 standard removed support for it (in the stuff following ... except as specified below.
).
Question: Is the \0
(not the null-character; in a string literal this would be "\\0"
) escape sequence legal in a C++11 regular expression? Is it legal in ECMA-262 (or are browser JS VMs just being "too" lenient)? What's the cause/justification for the different behaviors?
This was a bug in libc++'s implementation of <regex>
. It should be fixed now in the trunk, and this should propagate to OS X's release code eventually.
Also, here is the excerpt from the ECMA 262 Standard that is the basis for this bug report:
15.10.2.11 DecimalEscape
The production
DecimalEscape :: DecimalIntegerLiteral [lookahead ∉ DecimalDigit]
evaluates as follows:
- Let i be the MV of DecimalIntegerLiteral.
- If i is zero, return the EscapeValue consisting of a <NUL> character (Unicode value 0000).
- Return the EscapeValue consisting of the integer i.
Note: ... \0 represents the <NUL> character and cannot be followed by a decimal digit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With