Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++::Boost::Regex Iterate over the submatches

Tags:

c++

regex

boost

I am using Named Capture Groups with Boost Regex / Xpressive.

I would like to iterate over all submatches, and get both the value and KEY of each submatch (i.e. what["type"]).

sregex pattern = sregex::compile(  "(?P<type>href|src)=\"(?P<url>[^\"]+)\""    );

sregex_iterator cur( web_buffer.begin(), web_buffer.end(), pattern );
sregex_iterator end;

for( ; cur != end; ++cur ){
    smatch const &what = *cur;

    //I know how to access using a string key: what["type"]
    std::cout << what[0] << " [" << what["type"] << "] [" << what["url"] <<"]"<< std::endl;

    /*I know how to iterate, using an integer key, but I would
      like to also get the original KEY into a variable, i.e.
      in case of what[1], get both the value AND "type"
    */
    for(i=0; i<what.size(); i++){
        std::cout << "{} = [" << what[i] << "]" << std::endl;
    }

    std::cout << std::endl;
}
like image 933
Michael Avatar asked Apr 27 '10 03:04

Michael


2 Answers

With Boost 1.54.0 this is even more difficult because the capture names are not even stored in the results. Instead, Boost just hashes the capture names and stores the hash (an int) and the associated pointers to the original string.

I've written a small class derived from boost::smatch that saves capture names and provides an iterator for them.

class namesaving_smatch : public smatch
{
public:
    namesaving_smatch(const regex& pattern)
    {
        std::string pattern_str = pattern.str();
        regex capture_pattern("\\?P?<(\\w+)>");
        auto words_begin = sregex_iterator(pattern_str.begin(), pattern_str.end(), capture_pattern);
        auto words_end = sregex_iterator();

        for (sregex_iterator i = words_begin; i != words_end; i++)
        {
            std::string name = (*i)[1].str();
            m_names.push_back(name);
        }
    }

    ~namesaving_smatch() { }

    std::vector<std::string>::const_iterator names_begin() const
    {
        return m_names.begin();
    }

    std::vector<std::string>::const_iterator names_end() const
    {
        return m_names.end();
    }

private:
    std::vector<std::string> m_names;
};

The class accepts the regular expression containing the named capture groups in its constructor. Use the class like so:

namesaving_smatch results(re);
if (regex_search(input, results, re))
    for (auto it = results.names_begin(); it != results.names_end(); ++it)
        cout << *it << ": " << results[*it].str();
like image 190
ladenedge Avatar answered Nov 05 '22 22:11

ladenedge


After looking at this for more than an hour, I feel fairly safe saying, "it can't be done captain". Even in the boost code, they iterate over the private named_marks_ vector when doing the lookup. It is just not setup to allow that. I'd say the best bet would be to iterate over the ones you think should be there and catch the exception for those that aren't found.

const_reference at_(char_type const *name) const
{
    for(std::size_t i = 0; i < this->named_marks_.size(); ++i)
    {
        if(this->named_marks_[i].name_ == name)
        {
            return this->sub_matches_[ this->named_marks_[i].mark_nbr_ ];
        }
    }
    BOOST_THROW_EXCEPTION(
        regex_error(regex_constants::error_badmark, "invalid named back-reference")
    );
    // Should never execute, but if it does, this returns
    // a "null" sub_match.
    return this->sub_matches_[this->sub_matches_.size()];
}
like image 42
boatcoder Avatar answered Nov 05 '22 20:11

boatcoder