Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

extracting original regex pattern from std::regex

I have a function which is attempting to match a given string against a given regex pattern. If it does not match, it should create a string indicating such occurrence and include the regex pattern it failed and the content of the string. Something similar to such:

bool validate_content(const std::string & str, const std::regex & pattern, std::vector<std::string> & errors)
{
    if ( false == std::regex_match(str, pattern) )
    {
        std::stringstream error_str;
        // error_str << "Pattern match failure: " << pattern << ", content: " << str;
        errors.push_back(error_str.str());
        return false;
    }
    return true;
}

However as you can see, the commented-out line presents a challenge: is it possible to recover the original pattern of the regex object?

There is obviously a workaround of providing the original pattern string (instead of or alongside) the regex object and then using that. But I would have of course preferred to not need to include the extra work to either recreate the regex object every time this function is called (biting cost in reparsing the pattern every time the function is called) or to pass the regex pattern along with the regex object (prone to typos and errors unless I provide a wrapper which does that for me, which is not as convenient).

I'm using GCC 4.9.2 on Ubuntu 14.04.

like image 714
inetknght Avatar asked Jun 22 '15 22:06

inetknght


2 Answers

boost::basic_regex objects have a str() function which returns a (copy of) the character string used to construct the regular expression. (They also provide begin() and end() interfaces which return iterators to the character sequence, as well as a mechanism for introspecting capture subexpressions.)

These interfaces were in the initial TR1 regex standardization proposal, but were removed in 2003, after the adoption of n1499: Simplifying Interfaces in basic_regex, from which I quote:

basic_regex Should Not Keep a Copy of its Initializer

The basic_regex template has a member function str which returns a string object that holds the text used to initialize the basic_regex object… While it might occasionally be useful to look at the initializer string, we ought to apply the rule that you don't pay for it if you don't use it. Just as fstream objects don't carry around the file name that they were opened with, basic_regex objects should not carry around their initializer text. If someone needs to keep track of that text they can write a class that holds the text and the basic_regex object.

like image 168
rici Avatar answered Oct 14 '22 17:10

rici


According to the standard N4431 §28.8/2 Class template basic_regex [re.regex] (Emphasis mine):

Objects of type specialization of basic_regex are responsible for converting the sequence of charT objects to an internal representation. It is not specified what form this representation takes, nor how it is accessed by algorithms that operate on regular expressions. [ Note: Implementations will typically declare some function templates as friends of basic_regex to achieve this — end note ]

Thus, the basic_regex object is not required to keep internally the original character sequence.

Consequently, you must store the sequence of characters upon the creation of the regex. For example:

struct RegexPattern {
  std::string pattern;
  std::regex  reg;
};
...
bool validate_content(const std::string & str, const RegexPattern & pattern, std::vector<std::string> & errors) {
    if(false == std::regex_match(str, pattern.reg)) {
        std::stringstream error_str;
        error_str << "Pattern match failure: " << pattern.pattern << ", content: " << str;
        errors.push_back(error_str.str());
        return false;
    }
    return true;
}

Another more elegant solution proposed by @Praetorian but somewhat less inefficient (I haven't benchmarked the two versions, thus I'm not sure). Would be to keep the pattern string and pass it as input argument to the function validate_content and create the regex object internally, as shown below:

bool validate_content(const std::string & str, const string & pattern, std::vector<std::string> & errors) {
    std::regex reg(pattern);
    if(false == std::regex_match(str, reg)) {
        std::stringstream error_str;
        error_str << "Pattern match failure: " << pattern << ", content: " << str;
        errors.push_back(error_str.str());
        return false;
    }
    return true;
}
like image 38
101010 Avatar answered Oct 14 '22 18:10

101010