I'm trying to find an efficient way to greedily find the first match for a std::regex without analyzing the whole input.
My specific problem is that I wrote a hand made lexer and I'm trying to provide rules to parse common literal values (eg. a numeric value).
So suppose a simple let's say
std::regex integralRegex = std::regex("([+-]?[1-9]*[0-9]+)");
Is there a way to find the longest match starting from the beginning of input without scanning all of it? It looks like std::regex_match tries to match the whole input while std::regex_search forcefully finds all matches.
Maybe I'm missing a trivial overload for my purpose but I can't find an efficient solution to the problem.
Just to clarify the question: I'm not interested in stopping after first sub-match and ignore the remainder of input but for an input like "51+12*3" I'd like something that finds first 51 match and then stops, ignoring whatever is after.
First of all [+-]?[1-9]?[0-9]+ I think it does the same think, but should be a bit faster. Or you intend to use something like this: [+-]?[1-9][0-9]*|0 (zero without sign or number not starting with zero).
Secondly C++ provides regular expression iterator:
const std::string s = "51+12*3";
std::regex number_regex("[+-]?[1-9]?[0-9]+");
auto words_begin =
std::sregex_iterator(s.begin(), s.end(), number_regex);
auto words_end = std::sregex_iterator();
std::cout << "Found "
<< std::distance(words_begin, words_end)
<< " numbers:\n";
for (std::sregex_iterator i = words_begin; i != words_end; ++i) {
std::smatch match = *i;
std::string match_str = match.str();
std::cout << match_str << '\n';
}
And looks like this is what you need.
https://wandbox.org/permlink/tkaAfIslkWeY2poo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With