Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression capturing repeated groups

Tags:

c++

regex

boost

Hi so I have been trying to figure out how to capture these groups properly and I can't: http://www.debuggex.com/r/xOmFR78EkK3mATN4/0

In the example I need to capture each individual part of the expression in test1 == 0 test2 == 1 and test3 == 2 Right now I only match test1 and test3 I can't figure out how I can get all of the expressions matched properly.

I will be using C++ and boost regex, though that shouldn't change anything

like image 549
csteifel Avatar asked Feb 18 '26 07:02

csteifel


2 Answers

I think your test2 isn't getting captured because it's being captured by group 7, but the contents of group 7 are getting overwritten when it matches test3.

For boost regex, look at the documentation for match_flag_type, specifically match_extra.

like image 143
Eric Finn Avatar answered Feb 20 '26 21:02

Eric Finn


You can use Boost.Xpressive for this:

#include <iostream>
#include <boost/xpressive/xpressive.hpp>

using namespace boost::xpressive;

int main()
{
    std::string str( "testrule: test1 == 0 && test2 == 1 && test3 == 2 ; test desc" );

    sregex_compiler comp;
    regex_constants::syntax_option_type x = regex_constants::ignore_white_space;
    comp.compile("(? $test = )(([\\w\\.]+)\\s+(==|!=|>|<)\\s+([\\w\\.]+))", x);
    sregex test = comp.compile("^(\\w+):\\s+(? $test )(\\s&&\\s(? $test ))*\\s*;\\s*(.*)$", x);

    smatch what;
    if(regex_match(str, what, test))
    {
        for(smatch const & nested : what.nested_results())
            std::cout << nested[0].str() << std::endl;
    }
}

This program prints the following:

test1 == 0
test2 == 1
test3 == 2

It makes strategic use of nested dynamic regexes, which I don't believe Boost.Regex supports. The good news is that if you have Boost, the above should Just Work. Xpressive is a header-only library; that is, it doesn't need to be built.

You can make this far more efficient using Xpressive's semantic actions. That's not harder, but does forgo much of the regex syntax you're clearly familiar with.

Another option would be to build a simple parser using Boost.Spirit, which is also header only.

HTH!

like image 26
Eric Niebler Avatar answered Feb 20 '26 22:02

Eric Niebler



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!