Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get all possible matches of std::regex

Tags:

c++

regex

c++11

stl

I would like to find all possible matches of regex, how is it possible?

regex rx("(2|25)");
string s = "2225";
for (sregex_iterator it(s.begin(), s.end(), rx), end; it != end; ++it) {
    cout << it->position() << ": " << it->str() << endl;
}

Gives output:

0: 2
1: 2
2: 25

But can't find third 2: 2 exactly. I prefer to use regex because of O(n) complexity for searching several tokens at same time.

UPDATE:

Maybe split token list to non-prefixable lists and create several regexes? For example: (2|4|25|45|251|455|267) => (2|4), (25|45|267), (251|455) This will grow complexity to something like O(n log(m))

UPDATE 2:

Please, provide short STL-based algorithm of splitting token vector to non-prefixable vectors to answer this question.

like image 977
k06a Avatar asked Oct 15 '15 07:10

k06a


People also ask

How do I get everything in regular expressions?

Throw in an * (asterisk), and it will match everything. Read more. \s (whitespace metacharacter) will match any whitespace character (space; tab; line break; ...), and \S (opposite of \s ) will match anything that is not a whitespace character.

What does regex 0 * 1 * 0 * 1 * Mean?

(0+1)*1(0+1)* If this is a regular expression, it will match. zero digit once or more in a row, followed by "1" digit once. above combination zero or more times, followed by "1" digit.

How do you match expressions in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

Is std :: regex slow?

The current std::regex design and implementation are slow, mostly because the RE pattern is parsed and compiled at runtime. Users often don't need a runtime RE parser engine as the pattern is known during compilation in many common use cases.


2 Answers

I dont think it's possible with an iterator and a single regexp. Here's how it works.

Your regexp searches for a substring that is either "2" or "25". Now, you start the search with sregex_iterator. It starts with the first symbol of the string, and tries to find match with your regular expression. If there is a match, it is "recorded", and the iterator is advanced to the position after the match. If there is no match, the iterator is advanced 1 position forward. This process continues until the end of the string is reached.

Now, each time it finds a match it will try to find the best (i.e., longest) match from your regular expression. So if a substring matches both 2 and 25, it will take 25 since it's longer. So I'd say you need 2 regular expressions.

like image 170
SingerOfTheFall Avatar answered Oct 25 '22 06:10

SingerOfTheFall


You can't obtain the third '2', because regexes always return the longest match. In order to get "all the possible matches" you need to run the query two times, since 2 is contained in 25.

like image 26
deight Avatar answered Oct 25 '22 06:10

deight