When there is \r
in the matching string, std::regex
and boost::regex
behave differently. Why?
code:
#include <iostream>
#include <string>
#include <regex>
#include <boost/regex.hpp>
int main()
{
std::string content = "123456728\r,234";
std::string regex_string = "2.*?4";
boost::regex reg(regex_string);
boost::sregex_iterator it(content.begin(),content.end(),reg);
boost::sregex_iterator end;
std::cout <<"content size:" << content.size() << std::endl;
//boost match 234 and 28\r,234
while (it != end)
{
std::cout <<"boost match: " << it->str(0) <<" size: " <<it->str(0).size() << std::endl;
++it;
}
std::regex regex_std(regex_string);
std::sregex_iterator it_std(content.begin(),content.end(),regex_std);
std::sregex_iterator std_end;
//std match 234 and 234
while (it_std != std_end)
{
std::cout <<"std match: " << it_std->str(0) <<" size: " << it_std->str(0).size() << std::endl;
++it_std;
}
return 0;
}
I think the boost library behaves normally, but I don't understand why the standard library is implemented this way.
That is expected.
std::regex
default flavor is ECMAScript-262, and in ECMAScript, the .
char matches any character but any LineTerminator
character:
The production Atom :: . evaluates as follows:
- Let A be the set of all characters except LineTerminator.
- Call CharacterSetMatcher(A, false) and return its Matcher result.
And then 7.3Line Terminators says:
Line terminators are included in the set of white space characters that are matched by the
\s
class in regular expressions.
Code Unit Value | Name | Formal Name |
---|---|---|
\u000A |
Line Feed | <LF> |
\u000D |
Carriage Return | <CR> |
\u2028 |
Line separator | <LS> |
\u2029 |
Paragraph separator | <PS> |
In Boost regex, however, .
matches
The NULL character when the flag match_not_dot_null is passed to the matching algorithms.
The newline character when the flag match_not_dot_newline is passed to the matching algorithms.
So, .
in Boost regex matches \r
, in std::regex
, it does not.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With