I am trying to test (using boost::regex
) whether a line in a file contains only numeric entries seperated by spaces. I encountered an exception which I do not understand (see below). It would be great if someone could explain why it is thrown. Maybe I am doing something stupid here in my way of defining the patterns? Here is the code:
// regex_test.cpp
#include <string>
#include <iostream>
#include <boost/regex.hpp>
using namespace std;
using namespace boost;
int main(){
// My basic pattern to test for a single numeric expression
const string numeric_value_pattern = "(?:-|\\+)?[[:d:]]+\\.?[[:d:]]*";
// pattern for the full line
const string numeric_sequence_pattern = "([[:s:]]*"+numeric_value_pattern+"[[:s:]]*)+";
regex r(numeric_sequence_pattern);
string line= "1 2 3 4.444444444444";
bool match = regex_match(line, r);
cout<<match<<endl;
//...
}
I compile that successfully with
g++ -std=c++11 -L/usr/lib64/ -lboost_regex regex_test.cpp
The resulting program worked fine so far and match == true
as I wanted. But then I test an input line like
string line= "1 2 3 4.44444444e-16";
Of course, my pattern isn't built to recognise the format 4.44444444e-16
and I would expect that match == false
. However, instead I get the following runtime error:
terminate called after throwing an instance of
'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error> >'
what(): The complexity of matching the regular expression exceeded predefined bounds.
Try refactoring the regular expression to make each choice made by the state machine unambiguous.
This exception is thrown to prevent "eternal" matches that take an indefinite period time to locate.
Why is that?
Note: the example I gave is extremal in the sense that putting one digit less after the dot works ok. That means
string line= "1 2 3 4.4444444e-16";
just results in match == false
as expected. So, I'm baffled. What is happening here?
Thanks already!
Update:
Problem seems to be solved. Given the hint of alejrb I refactored the pattern to
const string numeric_value_pattern = "(?:-|\\+)?[[:d:]]+(?:\\.[[:d:]]*)?";
That seems to work as it should. Somehow, the isolated optional \\.
inside the original pattern [[:d:]]+\\.?[[:d:]]*
left to many possibilities to match a long sequence of digits in different ways.
I hope the pattern is safe now. However, if someone finds a way to use it for a blow up in the new form, let me know! It's not so obvious for me whether that might still be possible...
I'd say that your regex is probably exponentially backtracking. To protect you from a loop that would become entirely unworkable if the input were any longer, the regex engine just aborts the attempt.
One of the patterns that often causes this problem is anything of the form (x+x+)+
- which you build up here when you place the first pattern inside the second.
There's a good discussion at http://www.regular-expressions.info/catastrophic.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With