Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why did boost regex run out of stack space?

#include <boost/regex.hpp>
#include <string>
#include <iostream>

using namespace boost;
static const regex regexp(
        "std::vector<"
            "(std::map<"
                   "(std::pair<((\\w+)(::)?)+, (\\w+)>,?)+"
             ">,?)+"
        ">");

std::string errorMsg =
"std::vector<"
        "std::map<"
                "std::pair<Test::Test, int>,"
                "std::pair<Test::Test, int>,"
                "std::pair<Test::Test, int>"
        ">,"
        "std::map<"
                "std::pair<Test::Test, int>,"
                "std::pair<Test::Test, int>,"
                "std::pair<Test::Test, int>"
        ">"
">";
int main()
{
    smatch result;
    if(regex_match(errorMsg, result, regexp))
    {  
        for (unsigned i = 0; i < result.size(); ++i)
        {  
            std::cout << result[i] << std::endl;
        }
    }

//    std::cout << errorMsg << std::endl;

    return 0;
}

this produces:

terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error>
>'   what():  Ran out of stack space trying to match the regular expression.

compiled with

g++ regex.cc -lboost_regex

EDIT

my platform:

g++ (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5
libboost-regex1.42
Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz
So the latest Ubuntu 64 bit
like image 385
Industrial-antidepressant Avatar asked Dec 21 '10 02:12

Industrial-antidepressant


2 Answers

((\\w+)(::)?)+ is one of the so called "pathological" regular expressions -- it's going to take exponential time, because you have two expressions which are dependent upon each other one right after the other. That is, it fails due to catastrophic backtracking.

Consider if we follow the example of the link, and reduce "something more complicated" to "x". Let's do that with \\w:

  • ((x+)(::)?)+

Let's also assume that our input is never going to have ::. Having this actually makes the regex more complex, so if we throw out complexity then we really should be making things simpler if nothing else:

  • (x+)+

Now you've got a textbook nested quantifier problem like that detailed in the link above.

There are a few ways to fix this but the simplest way is probably to just disallow backtracking on the inner match using the atomic group modifier "(?>":

  • ((?>\\w+)(::)?)+
like image 110
Billy ONeal Avatar answered Oct 18 '22 16:10

Billy ONeal


Tested it locally and it worked fine, my guess is your compiler is doing something weird.

What version of gcc? what platform? which version of boost?

 -> ./regex
std::vector<std::map<std::pair<Test::Test, int>,std::pair<Test::Test, int>,std::pair<Test::Test, int>>,std::map<std::pair<Test::Test, int>,std::pair<Test::Test, int>,std::pair<Test::Test, int>>>
std::map<std::pair<Test::Test, int>,std::pair<Test::Test, int>,std::pair<Test::Test, int>>
std::pair<Test::Test, int>
Test
Test
::
int
like image 31
OneOfOne Avatar answered Oct 18 '22 16:10

OneOfOne