Note that this is not a duplicate of the many questions on StackOverflow concerning gcc, I'm using Visual Studio 2013.
This simple construction of a regular expression throws std::regex_error
:
bool caseInsensitive = true;
char pattern[] = "\\bword\\b";
std::regex re(pattern, std::regex_constants::ECMAScript | (caseInsensitive ? std::regex_constants::icase : 0));
The actual error returned by what
on the exception object is not consistent. Usually it's a mismatched parethesis or brace. Why?
The problem arises because of the multiple constructors available for std::regex
. Tracing into the constructor showed it using one I didn't intend!
I wanted to use this one:
explicit basic_regex(_In_z_ const _Elem *_Ptr,
flag_type _Flags = regex_constants::ECMAScript)
But I got this one instead:
basic_regex(_In_reads_(_Count) const _Elem *_Ptr, size_t _Count,
flag_type _Flags = regex_constants::ECMAScript)
The ternary expression in the flags causes the type to change to int
, which no longer matches flag_type
in the constructor signature. Since it does match on size_t
it calls that constructor instead. The flags are misinterpreted as the size of the string, resulting in undefined behavior when the memory past the end of the string is accessed.
The problem is not specific to Visual Studio. I was able to duplicate it in gcc: http://ideone.com/5DjYiz
It can be fixed two ways. First is an explicit cast of the argument:
std::regex re(pattern, static_cast<std::regex::flag_type>(std::regex_constants::ECMAScript | (caseInsensitive ? std::regex_constants::icase : 0)));
Second is to avoid integer constants in the ternary expression:
std::regex re(pattern, caseInsensitive ? std::regex_constants::ECMAScript | std::regex_constants::icase : std::regex_constants::ECMAScript);
I don't find either of the proposed solutions particularly compelling or aesthetically pleasing. I think I'd prefer something like this:
auto options = std::regex_constants::ECMAScript;
if (caseInsensitive)
options |= std::regex_constants::icase;
std::regex re(pattern, options);
If, for some misguided reason, you really insist on a single line of code, I'd use a value-constructed object of the correct type in the ternary expression:
std::regex re(pattern, std::regex_constants::ECMAScript | (caseInsensitive ? std::regex_constants::icase : std::regex_constants::std::regex_option_type{}));
Or, since ECMAScript is the default, you use:
std::regex re(pattern, (caseInsensitive ? std::regex_constants::icase : std::regex_constants::ECMAScript));
At least to my eye, the first of these is clearly preferable though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With