Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inconsistency between boost::regex and std::regex [duplicate]

Possible Duplicate:
No matches with c++11 regex

I was using boost::regex for some stuff before and for some new stuff I wanted to use std::regex until I noticed the following inconsistency - so question is which one is correct?

#include <iostream>
#include <regex>
#include <string>

#include <boost/regex.hpp>

void test(std::string prefix, std::string str)
{
  std::string pat = prefix + "\\.\\*.*?";

  std::cout << "Input   : [" << str << "]" << std::endl;
  std::cout << "Pattern : [" << pat << "]" << std::endl;

  {
    std::regex r(pat);
    if (std::regex_match(str, r))
      std::cout << "std::regex_match: true" << std::endl;
    else
      std::cout << "std::regex_match: false" << std::endl;

    if (std::regex_search(str, r))
      std::cout << "std::regex_search: true" << std::endl;
    else
      std::cout << "std::regex_search: false" << std::endl;
  }

  {
    boost::regex r(pat);
    if (boost::regex_match(str, r))
      std::cout << "boost::regex_match: true" << std::endl;
    else
      std::cout << "boost::regex_match: false" << std::endl;

    if (boost::regex_search(str, r))
      std::cout << "boost::regex_search: true" << std::endl;
    else
      std::cout << "boost::regex_search: false" << std::endl;
  }
}

int main(void)
{
  test("FOO", "FOO.*");
  test("FOO", "FOO.*.*.*.*");
}

For me (gcc 4.7.2, -std=c++11, boost: 1.51), I see the following:

Input   : [FOO.*]
Pattern : [FOO\.\*.*?]
std::regex_match: false
std::regex_search: false
boost::regex_match: true
boost::regex_search: true
Input   : [FOO.*.*.*.*]
Pattern : [FOO\.\*.*?]
std::regex_match: false
std::regex_search: false
boost::regex_match: true
boost::regex_search: true

If I change the pattern to a greedy pattern (.*), then I see:

Input   : [FOO.*]
Pattern : [FOO\.\*.*]
std::regex_match: true
std::regex_search: false
boost::regex_match: true
boost::regex_search: true
Input   : [FOO.*.*.*.*]
Pattern : [FOO\.\*.*]
std::regex_match: true
std::regex_search: false
boost::regex_match: true
boost::regex_search: true

Which one to believe? I would guess that boost is correct here?

like image 752
Nim Avatar asked Nov 23 '12 10:11

Nim


1 Answers

gcc of course doesn't support the tr1/c++11 regex, but to give a more general answer, boost.regex's default is perl 5, according to its documentation, while C++ default is ECMAScript, extended by several locale-dependent elements of POSIX BRE.

Specifically, boost.regex supports the perl extensions listed here., but you're not using any of those.

Now, I got curious and ran your test through two more compilers:

Output from clang:

~ $ clang++ -o test test.cc -std=c++11 -I/usr/include/c++/v1 -lc++ -lboost_regex
~ $ ./test
Input   : [FOO.*]
Pattern : [FOO\.\*.*?]
std::regex_match: true
std::regex_search: true
boost::regex_match: true
boost::regex_search: true
Input   : [FOO.*.*.*.*]
Pattern : [FOO\.\*.*?]
std::regex_match: false
std::regex_search: true
boost::regex_match: true
boost::regex_search: true

Output from Visual Studio 2012 (sans boost)

Input   : [FOO.*]
Pattern : [FOO\.\*.*?]
std::regex_match: true
std::regex_search: true
Input   : [FOO.*.*.*.*]
Pattern : [FOO\.\*.*?]
std::regex_match: true
std::regex_search: true

Looking closer at clang's discrepancy, in the second test it matched the pattern [FOO\.\*.*?] to [FOO.*] and left [.*.*.*] unmatched, which quickly boils down to matching [S*?] differently from boost/visual studio.. which, I think, is a bug too.

like image 154
Cubbi Avatar answered Sep 20 '22 01:09

Cubbi