I am seeing different results when using the C POSIX regex library and the C++ standard library implementation. Here is my code:
string pattern = "\\s";
string testString = " ";
regex_t cre;
int status = regcomp(&cre, pattern.c_str(), REG_EXTENDED);
int result = (regexec(&cre, testString.c_str(), 0, 0, 0) == 0);
cout << "C: " << result << endl;
regex re(pattern, regex_constants::extended);
smatch sm;
cout << "C++: " << regex_search(testString, sm, re) << endl;
The C portion successfully matches the whitespace, but the C++ one throws this error:
terminate called after throwing an instance of 'std::regex_error'
what(): Unexpected escape character.
I understand that the string literal is escaped meaning that the actual regex that is used in pattern matching should be \s
. I also only see this issue when using POSIX extended grammar. In the C++ version, if I do not specify POSIX extended grammar when constructing the regex, it defaults to ECMAScript grammar and is able to parse correctly.
What is going on here?
POSIX bracket expressions are a special kind of character classes. POSIX bracket expressions match one character out of a set of characters, just like regular character classes. They use the same syntax with square brackets. A hyphen creates a range, and a caret at the start negates the bracket expression.
C++11 uses ECMAScript grammar as the default grammar for regex. ECMAScript is simple, yet it provides powerful regex capabilities.
An extended regular expression specifies a set of strings to be matched. The expression contains both text characters and operator characters. Text characters match the corresponding characters in the strings being compared. Operator characters specify repetitions, choices, and other features.
A regular expression is a pattern that the regular expression engine attempts to match in input text. A pattern consists of one or more character literals, operators, or constructs.
POSIX Extended Regular Expressions. The Extended Regular Expressions or ERE flavor standardizes a flavor similar to the one used by the UNIX egrep command. "Extended" is relative to the original UNIX grep, which only had bracket expressions, dot, caret, dollar and star.
C++ Regex 101 Published February 28, 2020 Since C++11, the C++ standard library contains the <regex> header, that allows to compare string against regular expressions (regexes). This greatly simplifies the code when we need to perform such operations.
Since C++11, the C++ standard library contains the <regex> header, that allows to compare string against regular expressions (regexes). This greatly simplifies the code when we need to perform such operations.
Regexes are often used to denote a standard textual syntax of a string. => Visit Here To See The C++ Training Series For All. Each character in a regular expression is either having a character with a literal meaning or a “metacharacter” that has special meaning. For example, a regular expression “a [a-z]” can have values ‘aa’, ‘ab’,’ ax’ etc.
regex_constants::extended
triggers the POSIX ERE regex syntax that does not support shorthand character classes. Note the C regex.h
module supports \s
as a non-standard extension.
To match any whitespace in regex_constants::extended
enabled POSIX ERE flavor, you need to use string pattern = "[[:space:]]"
.
However, you should just rely on the default ECMAScript flavor, and use
regex re(pattern);
// or
regex re(pattern, std::regex::ECMAScript);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With